Serial Analysis of Gene Expression – an overview


Alison Nairn, Kelley Moremen, in Handbook of Glycomics , 2010

You Are Watching: Serial Analysis of Gene Expression – an overview

Serial Analysis of Gene Expression

Serial Analysis of Gene Expression (SAGE) requires the isolation of mRNA and the generation of cDNA from which unique small sequences (∼14 bp), or tags, are generated using restriction enzyme digestion. The tags are then concatenated by ligation with other tags, amplified in a bacterial host and then sequenced. The number of times that a specific sequence tag is found determines the relative abundance of the transcript in that sample. Thousands of transcripts within a cell can be analyzed simultaneously with this technique [192]. SAGE is very useful for discovering new transcripts for which the gene sequences were not previously known. However, generation of a small sequence tag may not provide enough information to clearly determine the identity of a new sequence [193,194]. A modified version of this technique who uses longer tags has been shown to be more useful [195]. A variation of SAGE was developed as a tool to provide information about transcript abundance as well as transcriptional regulation. Cap analysis gene expression (CAGE) involves the isolation of short sequence tags at the 5′ end of full-length mRNAs, which contrasts with SAGE where tags originate at the 3′ end of the mRNA [196,197]. The 5′ end tags are sequenced and can be used to determine the relative transcript abundance and can also be used to determine the location of the transcription start site for promoter analyses.

Since SAGE is generally used for gene discovery between two populations of cells or tissues, there are few, if any, glycobiology-targeted studies employing SAGE as the predominant analytical method. However, there are examples of SAGE analyses where glycan-related genes have been identified as abundantly expressed genes. For example, a SAGE analysis of human marrow mesenchymal stem cells found a high level of expression for genes involved in extracellular matrix (ECM) formation (e.g., collagen and matrix metalloproteinase-2) and cell adhesion (galectin-1) [198]. A recent study of the transcriptome of vascular endothelial cells subjected to static and shear fret conditions using SAGE revealed a decrease in expression for several ECM genes resulting from shear stress [199].

Read full chapter


Molecular Genetics; Lung and Breast Carcinomas

Shui Qing Ye, in Handbook of Immunohistochemistry and in Situ Hybridization of Human Carcinomas , 2002


Serial analysis of gene expression (SAGE) (Velculescu et al., 1995) and DNA microarray technology (Schena et al., 1995) have become popular high-throughput platforms for the genomewide analysis of gene expressions and functions in normal and diseased conditions as the large-scale human genomic sequencing effort comes to fruition. Previously, our understanding of the molecular basis of complex syndromes such as cancer, coronary artery disease, and diabetes relied heavily on the study of individual genes and how changes in specific genes affected phenotype. However, it is now generally recognized that phenotypic changes in a tissue are the result of changes in the spatial and temporal expression of dozens or even hundreds of genes. The molecular basis for disease is thus difficult to discern based simply on the study of individual genes. The research paradigm is shifting from the traditional search for a single disease-specific gene to the current understanding of the biochemical and molecular functioning of a variety of genes and how complicated networks of interaction can lead to the pathogenesis of various human diseases. In an attempt to look more globally at gene expression changes, methods such as cDNA subtraction, mRNA differential display, and expression sequence tag (EST) have been used. These, however, can analyze only a limited number of genes and are not quantitative. Two high-throughput technologies, SAGE (Velculescu et al., 1995) and oligonucleotide (Lockhart et al., 1996) or cDNA microarray (Schena et al., 1995), allow researchers to determine the expression pattern of thousands of genes simultaneously in normal or diseased tissues. With the development of SAGE or DNA microarray-based research, more accurate diagnosis and treatment of various human diseases based on an individual patient’s gene expression profile are burgeoning. Two attractive features of SAGE compared with DNA microarray are its ability to quantify gene expression without prior sequence information and its provision of gene expression result in a digital format. The SAGE technique has been intensively applied to the gene expression profiling of a number of human diseases. In this chapter, the first part will draw up a short synopsis of the SAGE technique and its contrast to DNA microarray; the second part will expound the protocol of the SAGE technique; the third part will discuss new biological insights derived from the application of SAGE in human diseases; and perspectives will be offered in the last part.

SAGE Technique Overview and Its Contrast to DNA Microarray

The SAGE technique was originally developed by Velculescu et al. (1995). Two basic principles underlie the SAGE methodology: 1) a short sequence tag (10 bp) in a defined position in the cDNA who contains sufficient information to uniquely identify a transcript and 2) the concatenation of tags, which allows for efficient sequence-based analysis of transcription. The detailed introduction of the SAGE technique can be found in two comprehensive reviews (Bertelsen et al., 1998; Madden et al., 2000). The schematic of the SAGE procedure is depicted in Figure 11A. Briefly, poly(A)+ RNA is isolated by oligo-dT column chromatography. Then cDNA is synthesized from poly(A)+ RNA using a primer of biotin-5′-T18-3′. The cDNA is cleaved with an anchoring enzyme (e.g., NlaIII), and the 3′-terminal cDNA fragments are bound to streptavidin-coated beads. An oligonucleotide linker containing recognition sites for a tagging enzyme (e.g., BsmFI) is linked to the bound cDNA. The tagging enzyme is a class II restriction endonuclease that cleaves the DNA at a constant number of bases 3′ to the recognition site. This results in the release of a short tag plus the linker from the beads after digestion with BsmFI. The 3′-ends of the released tags plus linkers are then blunted and ligated to one another to form 102-bp linked ditags. After polymerase chain reaction (PCR) amplification of the 102-bp ditags, the linkers and tags are release by digestion with the anchoring enzyme. The 26–28 mer tags are then gel purified, concatenated, and cloned into a sequence vector. Sequencing the concatemers enables individual tags to be identified and the abundance of the transcripts for a given cell line or tissue to be determined (Velculescu et al., 1995).

Serial Analysis of Gene Expression - an overview

Figure 11. Schematics of the serial analysis of gene expression (SAGE, A) and microarray (B). A: The schematic of SAGE: Step 1, cDNA synthesis via biotin (

Serial Analysis of Gene Expression - an overview

)-oligo dT: poly(A)+ RNA is isolated by an oligo-dT column and cDNA is synthesized from poly(A)+ RNA through the biotin-oligo-dT. Step 2, Anchoring enzyme digestion and streptavidin beads (

Serial Analysis of Gene Expression - an overview

) binding: the cDNA is cleaved by the anchoring enzyme (e.g., NlaIII) and 3′-portion of the cDNA is captured by the streptavidin-coated magnetic beads. Step 3, Add linker 1 (

Serial Analysis of Gene Expression - an overview

) and linker 2 (

Serial Analysis of Gene Expression - an overview

): two different linkers containing a type II restriction enzyme (tagging enzyme such as BsmFI) are ligated to two aliquots of the captured cDNA, respectively. Step 4, Tagging enzyme digestion: short tags plus linkers are released from the cDNA by tagging enzyme digestion. Step 5, Blunting ends, ligation and PCR amplification: after the ends of the released short tags are blunted by the treatment of the DNA polymerase I, large (Klenow) fragment, the linker-tag molecules are ligated tail to tail to form ditags. Ditags are amplified by PCR. Step 6, Anchoring enzyme digestion, concatenation by ligase, cloning and DNA sequencing: ditags are released from linkers by the anchoring enzyme digestion and concatenated by ligase. The concatemers are then cloned into a sequencing vector and are subjected to DNA sequencing. Step 7, Data analysis: SAGE tag information is analyzed by SAGE software to eventually lead to identifying the genes of interest in specific physiological or identifying disease states. B: The schematic of microarray: Depicted on the left side is the cDNA microarray schema. RNA from two different tissues or cell populations is used to synthesize single-stranded cDNA in the presence of nucleotides labeled with two different fluorescent dyes (e.g., Cy3 and Cy5). Both samples are mixed in a small volume of hybridization buffer and hybridized to the array surface, usually by stationary hybridization under a coverslip, resulting in competitive binding of differentially labeled cDNAs to the corresponding array elements. High-resolution confocal fluorescence scanning of the array with two different wavelengths corresponding to the dyes used provides relative signal intensities and ratios of mRNA abundance for the genes represented on the array. Depicted on the right side is the oligonucleotide microarray schema. The RNA from different tissues or cell populations is used to generate double-stranded cDNA carrying a transcriptional start site for T7 DNA polymerase. During in virto transcription, biotin-labeled nucleotides are incorporated into the synthesized cRNA molecules. Each target sample is hybridized to a separate probe array and target binding is detected by staining with a fluorescent dye coupled to streptavidin. Signal intensities of probe array element sets on different arrays are used to calculate relative mRNA abundance for the genes represented on the array. This figure was adopted from Ye et al. (2002). Courtesy of J. Biomed. Sci. (

Although SAGE has become an extremely powerful technique for global analysis of gene expression, its requirement for a large amount of input mRNA (2.5–5.0 μg, which is equivalent to 250–500 μg total RNA) limits its utility. Several laboratories have attempted SAGE gene profiling using smaller amounts of RNA, but these attempts have all involved either PCR amplification of starting cDNA materials, such as SAGE-Lite and PCR-SAGE (Neilson et al., 2000; Peters et al., 1999), or PCR reamplification of SAGE ditags generated by a first round of PCR amplification, such as microSAGE and SAGE adaptation for downsized extracts (SADE) (Datson et al., 1999; Virlon et al., 1999). These additional PCR amplifications potentially introduce bias and compromise the quantitative aspects of the SAGE method. A recently modified method, miniSAGE, was successfully applied to profile gene expression of human fibroblasts from 1-μg total RNA without extra PCR amplification (Ye et al., 2000). Three key modifications contributed to the establishment of the miniSAGE: 1) The application of Phase Lock Gel (Eppendorf) to purify DNA after phenol extraction to increase significantly the recovery of DNA material and get purer DNA; 2) the addition of 25-fold less linkers (10 ng/per reaction) in the ligation reaction to cDNA, thus reducing its interference with the SAGE ditag amplification and increasing the SAGE ditag yield; 3) the employment of the mRNA Capture Kit (Boehringer Manneheim) to carry out the initial steps depicted in Figure 11A, which include mRNA isolation, cDNA synthesis, enzyme cleavage of cDNA, and the binding of the cleaved biotin-cDNA to the streptavidinmagnetic beads, ligating linkers to the bound cDNA, and the release of cDNA tags in one tube to reduce significantly the loss of the material between the successive steps. Recently, Invitrogen ( developed a convenient I-SAGE kit, which packages all necessary quality-controlled reagents together, enough for five SAGE library syntheses. These modifications have contributed to the broad application of SAGE. Three major Web sites for SAGE technology and data information are listed in Table 6.

Table 6. SAGE Technology and Data Information Internet

Web Site URL Feature
SAGEnet SAGE introduction, software, protocols, references, and link to other SAGE resources
SAGEmap Online SAGE data analysis, tag to gene mapping, download CGAP SAGE data, submitting SAGE data to GEO
Genzyme Molecular Oncolocy SAGE technology and applications information for commercial users

The SAGE technique is a rival to DNA microarray technology. The latter was derived from an initial report in the mid-1970s (Burgess, 2001) that gene expression could be monitored by nucleic acid molecules attached to a solid support. A typical cDNA microarray operational scheme is illustrated in Figure 11B, right. Microarrays are created using a precise xyz robot that is programmed to spot cDNA samples onto a solid substrate, usually a glass microscope slide, in a high-density pattern. Similarly, cDNA arrays are also produced on nylon membranes but with spots of larger size in a lower-density pattern. Patrick Brown’s laboratory at Stanford University created the first xyz arrayer, and instructions on how to build an arrayer can be found on their Web site ( Many companies now produce cDNA microarrayers commercially. These machines differ primarily in the way the spot is placed on the substrate. Spotted arrays allow a greater degree of flexibility in the choice of arrayed elements, particularly for the preparation of smaller, customized samples. Accordingly, cDNA gridded arrays have been the technique most frequently used. With prices for oligonucleotide synthesis becoming more reasonable for large-scale studies, spotted long-oligonucleotide arrays could be a viable alternative to full-length cDNA array.

Oligonucleotide microarrays, another type of microarray (Schulze et al., 2001), use photolithography or ink-jet technology to synthesize oligonucelotides in situ, on silicon wafers. A typical oligonucleotide microarray operational schema is illustrated in Figure 11B right. High-density-oligonucleotide microarrays from Affymetrix ( are examples of the use of photolithography technology. A similar procedure developed by Rosetta Inpharmatics ( and licensed to Agilent Technologies represents an example of the use of ink-jet technology. Alternatively, presynthesized oligonucleotides can be printed onto glass slides. Methods based on synthetic oligonucleotides offer the advantage that sequence information alone is sufficient to generate the DNA to be arrayed; cDNAs do not have to be produced. Affymetrix (Santa Clara, CA) has pioneered the use of this form of array production with the development of the GeneChip. Its newest GeneChip, Human Genome U133 Set (HG-U133A and HG-U133B, released in January 2002), is created of two microarrays containing more than 1,000,000 unique oligonucleotide features covering more than 39,000 transcript variants, which in turn represent greater than 33,000 of the best characterized human genes. Affymetrix released (March 2003) its latest GeneChip Mouse Expression Set 430 (MOE430 A and B), which can analyze the expression level of more than 39,000 mouse transcripts and variants, including more than 34,000 well-substantiated mouse genes, and Rat Expression Set 230 (RAE230A and B), which can analyze the expression level of more than 30,200 rat transcripts and variants, including more than 28,000 well-substantiated rat genes. These newly empowered GeneChips will facilitate the gene expression profiling of human diseases in animal models.

Ishii et al. (2000) compared the quantitative accuracy of oligonucleotide arrays and SAGE using identical RNA specimens prepared from human blood monocytes and macrophages stimulated with granulocytemacrophage colony-stimulating factor (GM-CSF). Results for the unstimulated monocytes and stimulated macrophages were similar. The correlation was better for genes who were more highly expressed or more differentially expressed. Nacht et al. (1999) demonstrated the strength of combining the two approaches to help elucidate pathways in breast cancer progression by comparing primary breast cancers, metastatic breast cancers, and normal mammary epithelial cells. They reported who combining SAGE and custom array technology allowed for the rapid identification and validation of the clinical relevance of genes potentially involved in breast cancer progression.

However, each technology has its pros and cons. Table 7 lists several major differences between SAGE and Microarray Technology. Because the preparation of microarrays requires prior knowledge of the sequence of the gene transcripts to be analyzed, an advantage of SAGE is that it can identify novel genes and be used to analyze gene expression in organisms whose genomes are largely uncharacterized. This is a serious limitation for microarrays, even for organisms with completely sequenced genomes such as humans, because genome annotation and gene prediction remain technically challenging. There are additional advantages to SAGE. The data SAGE provides are in the digital format, whereas microarray data are analog. Although SAGE data are directly comparable, the differences in microarray formats and normalization methodologies make direct comparison of data sets between microarray platforms somewhat difficult. The SAGE technology not only can accurately determine the absolute abundance of mRNAs but also can detect even slight differences in expression levels between samples. Microarray is only reliable in detecting genes whose expression differences are relatively large. Setup cost for SAGE is low. It does not require expensive instruments other than a DNA sequencer, which is available at most institutions. Microarrays need expensive robotic arrayers and scanners, currently available only in core facilities of major institutions. Microarrays, on the other hand, are relatively easy to use and more suitable for high-throughput applications. Also, SAGE requires several enzymatic manipulation steps, which are relatively taxing, especially for the novice, although Invitrogen has recently marketed a kit, the I-SAGE kit (, which helps new SAGE users. Finally, although the cost of DNA sequencing keeps falling, the relative high cost for sequencing a SAGE library is still a concern for potential SAGE users. Despite the great power of SAGE using a 10-bp tag as a transcript identifier, a small percentage of tags are ambiguous. A single tag may identify multiple genes or multiple tags may identify a single gene (Neilson et al., 2000). This remains a challenge for identifying genes. Chen et al. (2002) tried to solve this problem by extending the SAGE tags into 3′ cDNAs. They have improved their original generation of longer cDNA fragments from SAGE tags for gene identification (GLGI) technique into a high-throughput procedure for simultaneous conversion of a large number of SAGE tags into corresponding 3′ DNAs. This improved GLGI converting novel SAGE tags into 3′ cDNAs was demonstrated to be efficient in identifying the correct gene for SAGE tags with multiple matches and identifying novel genes in large numbers. Applying this high-throughput procedure should accelerate the rate of gene identification significantly in human and other eukaryotic genomes. Because microarrays use well-characterized immobilized sequences (cDNAs or oligonucleotides), identification of expressed genes is less of a problem.

Table 7. Differences Between SAGE and Microarray Technology

Read More: The risks of rooting your Android phone – BullGuard

Parameters SAGE Microarray
New gene identification Yes No
Data format Digital Analog
Sensitivity Higher Lower
Setup cost Low High
Operation procedure Relatively cumbersome Simpler
Application to multiple samples Higher sequencing cost involved Economical

Saha et al. (2002) developed a long SAGE method who generates 21-bp tags, instead of 14-bp, from the 3′ ends of transcripts. This method is similar to the original SAGE approach (Velculescu et al., 1995) but uses a different type IIS restriction endonuclease (MmeI) and incorporates other modifications to produce longer transcript tags. More than 75% of 21-bp tags, but not 14-bp tags, can be uniquely assigned to the human genome based on actual sequence information from ∼16,000 known genes, although 14-bp tags do allow such assignment to ESTs and previously characterized mRNAs (Caron et al., 2001; Lal et al., 1999; Velculescu et al., 1995).

The choice of gene expression technique is determined by the question being asked. Expression profiling of hundreds of disease samples is certainly more efficient using microarrays. However, SAGE seems to be a better choice for identifying new genes. In addition, SAGE is useful in analyzing previously uncharacterized organisms. Also, SAGE provides for more sensitive quantification of gene expression. Regular 14-bp tag SAGE is still useful for the quantification of mRNA level, while 21-bp tag SAGE (long SAGE) is more suitable for identifying new genes.

The SAGE Protocol

The standard SAGE protocol presented as follows is mainly based on Velculescu et al. (1995, 2000), and Saha et al. (2002). Some modifications and updates improving the performance of the standard SAGE protocol are incorporated. Because of space limitations, micro- or mini-SAGE protocol will not be described here.

All necessary reagents needed for the SAGE protocol is listed one by one in the first part of this section. Catalog number and company name for all reagents are provided. However, equivalent quality reagents from other resources can be substituted. Investigators may use the I-SAGE Kit (Invitrogen, Cat. No. T500001), which contains enough reagents to generate five libraries, each consisting of > 100,000 tags/ library starting from 5 μg total RNA. One advantage to using the I-SAGE Kit rather than assembling the reagents on your own is to use optimized and quality-controlled enzymes, adapters, primers, and buffers to save time.

Read full chapter



George P. Yang, Ronald J. Weigel, in Surgical Research , 2001

C. Serial Analysis of Gene Expression

Serial analysis of gene expression (SAGE) is based on the principle that an oligonucleotide sequence of 9–10 bp can uniquely identify a gene (18). In this technique, 9-bp oligonucleotides of cDNAs are cloned in a long, concatenated string with “punctuations” between the cDNA oligonucleotides. Figure 4 outlines this procedure. The long string is sequenced and the sequence of each oligonucleotide is compared to the GenBank or other sequence database. Abundantly expressed cDNAs turn up more frequently in the string whereas less abundant transcripts are less frequent. SAGE provides a statistical analysis of the frequency of expressed genes. A comparison by performing SAGE on a normal and a cancer cell can provide information on the differences in the pattern of gene expression. SAGE requires extensive sequencing and sequence analysis capability. Automated sequencing is mandatory. In addition, novel genes wont match known sequences and obtaining a full-length cDNA with a 10-bp oligonucleotide probe can be difficult.

Serial Analysis of Gene Expression - an overview

Figure 4. Serial analysis of gene expression. Diagram of SAGE technique resulting in a concatenated string of sequence tags.Reprinted with permission from Ref. (18), Velculescu, V. E., et al. (1995). Serial analysis of gene expression. Science 270(5235), 484–487. Copyright 1995 American Association for the Advancement of Science.Copyright © 1995 American Association for the Advancement of Science

Read full chapter


Handbook of Immunohistochemistry and in situ Hybridization of Human Carcinomas, Volume 4

Christian Haslinger, … Martin Schreiber, in Handbook of Immunohistochemistry and in Situ Hybridization of Human Carcinomas , 2006

Serial Analysis of Gene Expression

Serial analysis of gene expression (SAGE) is a technically unique approach used to quantitatively interrogate the expression of thousands of genes simultaneously (Velculescu et al., 1995). It is based on analyzing short (10–11 bp) sequence tags derived from a defined position within a transcript, which contains sufficient information to unequivocally identify this transcript (Tuteja and Tuteja, 2004). These tags are isolated, ligated together (concatemerized), cloned, and sequenced in a high-throughput manner. In a typical SAGE experiment, tens of thousands of these short sequence tags are sequenced per sample, and the frequency of each tag in the sequenced concatemers directly reflects the transcript abundance. Different samples are each analyzed individually, and the relative abundance of transcripts in each sample is then compared to identify up- or down-regulated genes. The major advantage of SAGE is who it does not require a preexisting clone or hybridization probe and hence can be used to identify and quantify known as well as novel genes. It is thus particularly well-suited for organisms with no or incomplete genome sequence information. In cancer research, SAGE has been applied primarily to the discovery of genes differentially expressed between cancerous and normal tissues or cell lines. Usually the sets of differentially expressed genes determined by SAGE tend to be smaller and to contain fewer false-positive hits than those determined by DNA microarray approaches. Extensive online SAGE databases have been established in recent years, such as the National Center for Biotechnology Information (NCBI) SAGE database ( This freely accessible database contains SAGE results of hundreds of tissue samples and cell lines of human and other species of origin. Via online comparison of two or more analyzed samples contained in the database, differentially regulated genes can be readily identified, providing a very useful starting point for the identification and characterization of tumor-specific or tissue-specific marker genes. To identify genes specific for tumor endothelial cells, St. Croix et al., 2000) have immunopurified endothelial cells from a colorectal tumor and from normal mucosa of the same patient. SAGE libraries of ∼100,000 tags of each sample were generated and compared with ∼1.8 million tags from a variety of cell lines derived from tumors of non-endothelial origin. In this way, genes that were specific for endothelial cells in vivo could be identified. Furthermore, transcripts who were considerably more abundant in tumor endothelium than in normal endothelium, and vice versa, were found, including many uncharacterized genes, which should provide a valuable resource for future investigations of tumor angiogenesis.

SAGE was also applied to assess the relevance of a specific genetically engineered mouse model of breast cancer for the study of the human disease (Hu et al., 2004). This mouse model relies on the transplant of p53 null mouse mammary epithelial cells into the cleared fat pad of syngeneic hosts. SAGE was used to obtain gene expression profiles of normal and tumor samples from this mouse model of breast cancer. In a cross-species comparison, these results were compared to 25 human breast cancer SAGE libraries. Significant similarities between mouse and human breast tumors were observed, and a number of transcripts were identified as commonly deregulated in both species (Hu et al., 2004). These findings demonstrate that this particular mouse model of breast cancer indeed closely mimics the human cancer it attempts to model. This elegant study highlights a distinct advantage of SAGE analysis when compared to DNA microarrays (i.e., that cross-species comparisons of comprehensive gene expression profiles are much more reliable and much easier to accomplish with SAGE data). Normalization of microarray data is a nontrivial task and, in the widely used two-color hybridization method, relies on a common reference probe hybridized to each of the samples to be compared. Such a common reference probe obviously does not exist if the samples to be compared are derived from different species. In SAGE, the frequency of each tag representing a particular transcript (i.e., the number of such tags divided by the number of all tags sequenced) directly reflects the abundance of that transcript. To determine if and to what extent a gene is differentially expressed in two or more different samples, the frequencies of the corresponding tags in these samples are simply compared; whether these different samples are all from the same or from different species is irrelevant for such comparisons. Moreover, DNA microarrays can interrogate only the expression of those genes that are represented on the array; when different species are compared, different arrays for each of the species must be used, and the overlap between the genes contained on these different arrays is usually incomplete. Because SAGE is not limited to preexisting clones or hybridization probes, this limitation of cross-species comparisons with DNA microarrays is readily avoided with SAGE.

Read total chapter


Case study: C-It, knowledge database for screening evolutionarily conserved, tissue-enriched, uncharacterized genes

Shizuka Uchida, in Annotating New Genes , 2012

4.3.4 Mouse Atlas of Gene Expression

Serial analysis of gene expression (SAGE) was developed in the early 1990s to detect and quantify mRNA in a sample of interest by utilizing small tags who correspond to fragments of those transcripts (Velculescu et al., 1995). Unlike microarrays, SAGE does not require pre-existing sequences for probe construction, which allows de novo identification of novel genes. After sequencing of all tags, these sequences are mapped back to the mRNA sequence databases [e.g. SAGEmap (, SAGE Genie ( (Boon et al., 2002)] to identify the corresponding genes. Simultaneously, the frequency of tags is determined, allowing the quantification of expression levels in comparison with a control sample (Wang, 2007). Sometimes it is difficult to identify tags due to their short length. Therefore, since its introduction in 1995, the length of small fragments of sequences has been increased from 10 (short SAGE) to 17 (long SAGE), 21 [Robust-LongSAGE tag (Gowda et al., 2004)] and 26 [Super-SAGE tag (Matsumura et al., 2005)].

The Mouse Atlas of Gene Expression project ( generated 202 SAGE libraries from 198 tissues and different developmental stages to provide a comprehensive catalog of genes expressed throughout the development of various organs of mouse strain C57BL/6 J, the most widely used strain in labs (Siddiqui et al., 2005). Information about all the libraries can be found at: This is by far the most comprehensive collection of SAGE expression profiles currently available. For example, in the case of heart, ten libraries from five developmental stages [whole hearts from embryonic day (E)8.5 embryos; atria, atrioventricular canals, bulbus cordis and left ventricles from E9.5 embryos; atria, atrioventricular canals and ventricles from E11.5 embryos; atria and ventricles from E12.5 embryos] are available. In each library, experiments were performed with the 10-bp SAGE or the more reliable 17-bp SAGE method.

To obtain genes expressed during the development of a tissue, all tags from the corresponding libraries of the target tissue were pooled and annotated with ‘SAGEmap_Mm_NlaIII_10_best’ or ‘SAGEmap_ Mm_NlaIII_17_best’ from NCBI’s SAGEmap ( (both libraries constructed from UniGene Build 170) (Lash et al., 2000). All tags with count > 1 were accepted as genes expressed during the development of the target tissue.

Read complete chapter


Genome-Wide Analysis of Gene Expression

D.-W. Doug Chung, K.G. Le Roch, in Encyclopedia of Biological Chemistry (Second Edition) , two013

Serial Analysis of Gene Expression

Serial analysis of gene expression (SAGE) was developed in 1995 by Dr. Victor Velculescu and provides a snapshot of transcript population within a sample by analyzing small tags that correspond to the 3′ fragments of messenger RNA (mRNA) (Figure 1). First, cDNA is synthesized from mRNA using biotinylated oligo(dT) primer and then is cleaved with a restriction enzyme (anchoring enzyme) that is expected to cut each transcript at least once. The most 3′ fragment of the cleaved cDNA is bound to streptavidin beads at its poly(A) tail who is complemented by biotinylated thymidines. Bound cDNA fragments are then ligated at the anchoring cleavage site with linkers who contain a type IIS restriction site. Type IIS endonucleases (tagging enzymes) cleave at a specific distance up to 20 bp away from the recognition site. This process is designed so who cleavage with the tagging enzymes results in the release of the linker along with a short piece of cDNA (tag) of around 9 bp. Released tags are then ligated together at their blunt ends to form a ditag, which serves as a template for polymerase chain reaction (PCR) amplification using primers specific to the linkers. Following PCR amplification, the flanking linkers are cleaved from the ditags using the anchoring enzyme. Cleaved ditags are subsequently concatenated, cloned into a plasmid vector, and then sequenced.

Serial Analysis of Gene Expression - an overview

Figure 1. Serial analysis of gene expression (SAGE). RNA is extracted and purified from desired samples. mRNA is converted into double-stranded cDNA by reverse transcriptase with biotinylated poly-T tails as primers. The biotinylated cDNA is fragmented with anchoring enzymes that recognize and cut at their respective restriction sites who are usually found naturally in every few hundred basepairs. The biotinylated ends of the fragmented cDNA are bound to streptavidin beads and split into two portions. Appropriate adaptors with an endonuclease site for tagging enzymes are ligated to each cDNA fragment at the cleaved end. Tagging enzymes are added to cleave a small distance from their endonuclease sites found at each adaptor, releasing the fragmented cDNA (up to 26 bp) from the beads. Blunt ends from each of the two portions of fragmented cDNA are ligated together, forming a ditag, and then amplified by PCR. After PCR, ditags are cleaved at their adaptors, concatenated, and then cloned into plasmids. Once cloned, plasmids are then sequenced, mapped back to the gene and subsequently analyzed for gene expression.

Read More: Electrical conductor – Wikipedia

With SAGE, up to 20 000 tags can be sequenced in a single experiment. With full-genome sequence information, the sequenced SAGE tags can be mapped back to the gene. This approach provides for a highly quantitative analysis since the number of tags for a particular gene can be counted. Since the advent of SAGE, several more robust variants, LongSAGE, RL-SAGE, and SuperSAGE, have been developed. These newer variants allow up to 100 000 sequenced tags per trial, thus providing a more comprehensive profile of the transcriptome, and capture longer tags, up to 26 bp, enabling more confident identification of the source gene. Though there have been improvements, SAGE is no longer widely implemented because it is highly labor intensive and is relatively low by means ofput compared to the newly developed sequencing technologies described below.

Read complete chapter


Functional Genomics

Shalini Kaushik, … Deepak Sharma, in Encyclopedia of Bioinformatics and Computational Biology , 2019

SAGE (Serial Analysis of Gene Expression)

Unlike microarray technology, where hybridization is the principle for analysing expression of genes, SAGE depends on RNA sequencing for global analysis of gene expression in a cell. The main benefit of using this technique over others is who it does not require any prior understanding of transcripts and it provides a clear illustration of both qualitative study by comparing gene expression between samples and quantitative study by identifying novel transcripts. The method involves the isolation of short (9–11 base pairs) oligonucleotides sequence tag from discrete mRNAs. These short distinctive sequence tags (SAGE tags) are unique to each gene, allowed to concatenate for efficient sequencing and analysis of transcripts in a serial fashion. The resulting sequence data are analyzed to identify each gene expressed in cell and the level of gene expression. This information can further be used to test the differences in the level of expression of gene between cells.

SAGE Bioinformatics Packages

The role of SAGE tools is very clear as we need them for the isolation of the unique sequence tags from raw sequence files and their tabulation as well as comparison of SAGE tag abundance. Furthermore, SAGE software helps in matching tags to reference sequences in other databases.

Various bioinformatics methods have been developed for compiling and analysing the SAGE data (Table 2; for a detailed list of tools see Tuteja and Tuteja, 2004). In order to identify the SAGE tags, SAGE300 (see “Relevant Websites section”) compiles the database of tags extracted from human sequences in Genbank. In contrast, POWER_SAGE software is a useful tool for planning SAGE experiments and it has the capability of handling large number of transcripts/cDNAs and different sample size combinations (Man et al., 2000). Expression Profile Viewer (ExProView) is a tool for the analysis of gene expression profiles derived from expressed sequence tags and SAGE (Larsson et al., 2000). It visualizes the transcript data in a two-dimensional array of dots, in which each dot represents a known gene as specified in the transcript database. In addition, SAGEnet (see “Relevant Websites section”) catalogs tags for colon cancer, pancreatic cancer and corresponding normal tissues while SAGEmap is a public database for gene expression (Lash et al., 2000).

Read total chapter


Biology and Pathology of Ovarian Cancer

Natini Jinawath, Ie-Ming Shih, in Early Diagnosis and Treatment of Cancer Series: Ovarian Cancer , two010

Overexpression of Apolipoprotein E

Based on serial analysis of gene expression(SAGE), investigators have found apoliprotein E (ApoE) overexpression in ovarian carcinoma. Besides the well-known role of ApoE in cholesterol transport and in the pathogenesis of atherosclerogenesis and Alzheimer’s disease, ApoE may play a novel role in the development of human cancer. ApoE immunoreactivity has been detected in 66% of high-grade but only 12% of low-grade ovarian serous carcinomas, and not in normal ovarian surface epithelium, serous cystadenomas, serous borderline tumors, or other type I tumors.38 Hence, expression of ApoE is primarily associated with the type II high-grade serous carcinomas. Inhibition of ApoE expression in vitro induces cell-cycle arrest and apoptosis in ApoE-expressing ovarian cancer cells, suggesting who ApoE expression is important for their growth and survival.

Read full chapter


Gene Regulation

M.W. White, … J.R. Radke, in Toxoplasma Gondii , 2007 SAGE analysis of primary VEG strain parasites from sporozoite to tachyzoite to bradyzoite in comparison with laboratory strains

Global comparisons of SAGE tags from VEG primary libraries (sporozoite, Day 4, 6, 7, 15 post-sporozoite infection and pH-shift mRNA sources used to construct primary SAGE libraries; Jerome et al., 1998) demonstrate unique changes in mRNA levels defining each of the developmental stages (Pearson correlations r = 0.0229–0.32), with the exception of sporozoite and parasites emerging from sporozoite infections at Day 4 (r = 0.728). SAGE tags uniquely associated with each developmental library encompass 23 percent of the total tags sequenced, and ranged from 1.5–5.3 percent of the tags in each library. Globally, our results parallel similar studies in Plasmodium that indicate mechanisms coordinating developmental mRNA expression in these parasites are largely stage-specific. Furthermore, the discovery of large groups of stage-specific genes in Toxoplasma adds support to the concept that developmental gene expression in this parasite likely follows a hierarchical order (Singh et al., 2002). Thus, our SAGE studies (and earlier microarray experiments; Singh et al., 2002) provide evidence who mRNA expression in Toxoplasma falls into distinct co-regulated classes and who groups of mRNAs, which are likely regulated by development-specific trans-acting factors, control phenotypic transitions associated with the Toxoplasma intermediate life cycle.

Analysis of gene expression across parasite development has revealed expected patterns of mRNA expression as well as surprising mRNA profiles that may provide new biological insights into the genetic program expressed by Toxoplasma, and help to explain variations in developmental capacity that occur between strains. We have reported that parasites emerging from primary sporozoite inoculations (VEG strain) undergo ~20 rapid divisions before synchronously slowing their growth rate and initiating bradyzoite gene expression approximately 1 week post-sporozoite infection (Jerome et al., 1998). Thus, it was not unexpected that we observed a downregulation in the frequency of SAGE tags corresponding to growth-related genes (e.g. TgPCNA1, DHFR-TS, and adenosine transporter) and some energy-related genes (e.g. fructose-1,6-phosphate, gyceraldehyde-3-phosphate, 3-phosphglycerate) in the Day-7 library (and all subsequent post-growth shift libraries) congruent with the dramatically slower doubling time of these parasites. Further agreement with earlier measurements of BAG1 in these populations (2 percent BAG1 at Day 7 post-sporozoite; Jerome et al., 1998) was reflected in the absence of the known bradyzoite genes in this library. Parasites from Day-15 post-sporozoite infection are a mixture of tachyzoite and bradyzoite forms (50 percent BAG1; Jerome et al., 1998), and thus SAGE tags corresponding to bradyzoite markers SAG4.2 and ENO1 were evident in this library along with expression of tachyzoite-specific genes like SAG1. Exposing VEG primary parasites to pH stress shifts the mRNA pool further towards the bradyzoite stage (Radke et al., 2003), as all of the published bradyzoite markers were detected (Lyons et al., 2002), and had higher frequencies, in the pH-shifted SAGE library, than were observed in the Day-15 library. Equally important, and predicted, was the near complete downregulation of tachyzoite-specific genes (e.g. SAG1, LDH1, ENO1) in the pH-shifted populations. In each of the developmental transitions sampled in this study there was an opportunity to discover novel parasite gene expression, and this remains an area of current investigation. For example, a novel NTPase that is related by sequence to NTPI and II (Bermudes et al., 1994) was identified whose expression was detected in the Day-15 parasites (but not at all in Day-7 parasites) and was among the highest expressed genes in the pH-shifted library (see section 16.3 below; Radke and White, unpublished results). Comprehensive lists of genes that summarize development-specific expression at each transition can be found at TgSAGEDB (

Previous studies of sporozoite infections have demonstrated who differentiation was an active and rapid process both in vitro and in vivo (Jerome et al., 1998). Although relatively few markers were monitored in these experiments, the results led to speculation that differentiation from the sporozoite to the tachyzoite stage was essentially complete in the first 1–2 days following a sporozoite infection. Whole-cell measurements by SAGE tells a different story, as mRNA pools of parasites emerging even 4 days post-sporozoite infection retain a significant fraction of sporozoite-specific mRNAs such that the correlation between these pools is high (sporozoite vs. Day 4, r = 0.728). This observation was unexpected and suggests who, in trying to understand Toxoplasma infections initiated by oocysts, sporozoite-specific biology may have significant influence during the early stages of development in the intermediate host.

Global comparisons of SAGE datasets from types I, II, and III laboratory strains within the context of VEG development provided other unexpected results that illuminate important biological differences between strains. There was a striking correlation between upregulated SAGE tags derived from VEG Day-6 post-sporozoite populations and those in the type I-RH strain that were not witnessed in comparison to the other primary VEG populations or strain types. This unique relationship in gene expression may reflect a shared biology: Day-6 VEG populations, like RH parasites, lack any evidence of sporozoite or bradyzoite mRNA expression, and grow with a similarly fast doubling time (Jerome et al., 1998; Radke et al., 2001). Importantly, SAGE libraries constructed from type II-Me49B7 and type III-VEGmsj parasites did not have elevated Day-6 SAGE tags, but unlike RH/Day-6 datasets, SAGE tags corresponding to bradyzoite genes were found in the libraries from these strains. This difference likely relates to the greater capacity of VEGmsj and Me49B7 parasites to enter the bradyzoite development pathway where RH is developmentally incompetent (Dubey et al., 1999). It is unknown whether this differential pattern of gene expression also applies to other virulent type I strains whose inefficiency at forming tissue cysts is also well established (Sibley and Boothroyd, 1992; Dubey, 1997, 1980), but the absence of bradyzoite gene priming in virulent strains that is evident in the microarray analysis of avirulent strains has recently been reported (Saeij et al., 2005). Thus, the mRNA patterns detected in laboratory strains appear to mirror characteristics from the natural development pathway with their comparative position with respect to the growth-shifted populations that occur at Day 7 post-sporozoite infection (Jerome et al., 1998). Strains whose gene expression places them to the left (and earlier) of this developmental boundary, like RH, are further removed from bradyzoite differentiation, and may be more virulent. By contrast, gene expression patterns consistent with populations to the right (i.e. evidence of basal bradyzoite gene expression) indicate a greater capacity for bradyzoite development, which is associated with avirulence (Saeij et al., 2005). Sorting out the gene expression mechanisms that control these developmental transitions and understanding how alterations in these controls result in the variation in strain developmental capacity is an important area for future studies.

Read full chapter


Category: Knowledge

Leave a Comment