You are here Glossary homepage/Search > Biology > Sequences DNA & beyond
Sequences – DNA & beyond
Evolving terminology for emerging technologies
Comments? Suggestions? Revisions? mchitty@healthtech.com
Last revised December 27, 2001 
Gene definitionsis inextricably linked to this glossary. Other related glossaries include Applications: Genomics, Proteomics, Sequencing Informatics: Algorithms, Molecular Modeling Biology: Biomolecules, ExpressionProteins, Protein Structures.  Additional definitions appear in the In-depth glossary, after the Bibliography.

alternative splicing: Gene definitions  Broader term splicing Related terms pre- mRNA splicing, protein splicing, RNA splicing, trans- splicing

alternative transcripts: Expression, genes & beyond

antisense DNA: Pharmaceutical biology glossary

antisense RNA: Pharmaceutical biology glossary

cDNA complementary DNA: Gene definitions

carbohydrate sequence: The sequence of carbohydrates within POLYSACCHARIDES, GLYCOPROTEINS, and GLYCOLIPIDS. [MeSH]  Biomolecules glossary

central dogma: Horace Judson Freeland quotes Francis Crick talking about the central dogma "Nobody tried to go from protein sequence back to nucleic acid, because that just wasn't on. You see. But I don't think it was ever discussed. ... Jim, [Watson] you might say, had it first. DNA makes RNA makes protein. That became then the general idea. ... what are all the possible information flows?" [Freeland asked why he had called it the central dogma?] "It was because, I think, of my curious religious upbringing. Because Jacques [Monod] has since told me that a dogma is something which a true believer cannot doubt!" Crick laughed. ... "But that wasn't what was in my mind. My mind was, that a dogma was an idea for which there was no reasonable evidence. You see?!" And Crick gave a roar of delight. "I just didn't know what dogma meant. And I could just as well have called it the "Central Hypothesis" - you know. Which is what I meant to say. Dogma was just a catch phrase.  ... And it's a negative hypothesis, so it's very very difficult to prove.... The central dogma is much more powerful [than Crick's sequence hypothesis], and therefore in principle you might have to say it could never be proved. But it's utility - there was no doubt about that. Because if you didn't believe that, you could invent theories, unlimited theories, whereas if you just put in that one assumption, ... then, essentially you were on the right track you see." ... "In looking back I am struck not only by the brashness which allowed us to venture powerful statements of a very general nature, but also by the rather delicate discrimination used in selecting what statements to make. Time has shown that not everybody appreciated our restraint" [HF Judson, Eighth Day of Creation Cold Spring Harbor Laboratory Press 1996 pp. 333-334]  

F. Crick "Central dogma of molecular biology' Nature227 (258): 561-563 Aug. 8, 1970 [historical article clarifying original explanation]

The Oxford English Dictionary makes clear the duality of dogma, particularly in the context of dogmatic, defined as "accepted as true instead of being based upon experience, particularly if done in an imperious, arrogant manner".  Dogma is defined as "systematised beliefs" (sometimes deprecating). Dogmatic physicians are cited as "an ancient sect" which "endeavoured to discover by reasoning the essence and occult causes" of disease.

Central dogma chapter MIT Biology Hypertextbook http://esg-www.mit.edu:8001/esgbio/dogma/dogmadir.html  

Related terms transcription, translation In-depth central dogma exceptions

cis-splicing: The joining together, after removal of the intron, of two segments of the same RNA molecule separated by an intron.  Related terms:  intron, RNA splicing, Trans splicing [California Space Institute, Glossary, 2000]http://calspace.ucsd.edu/origins/Glossary/C.htm

clone, cloning: Cell biology glossary

coding region(s): Gene definitions

codon: The sequence of three consecutive nucleotides that occurs in mRNA which directs the incorporation of a specific amino acid into a protein or represents the starting or termination signals of  protein synthesis. [IUPAC Biotech, IUPAC Medicinal Chemistry]

A set of three nucleotides in DNA or RNA that codes for a specific amino acid. The term is also used for the corresponding (and complementary) sequences of three nucleotides in messenger RNA into which the original DNA sequence is transcribed. [MeSH] Related terms transcription, translation. Narrower terms start codon, stop codon.

Coined by Sydney Brenner "for a triplet of bases that specifies an amino acid, introduced partly in satirical reference to Seymour Benzer's "cistron", "recon," and "muton", Brenner's "codon" is the one that survives in universal biological use. [HJ Freeman Eighth Day of Creation, Cold Spring Harbor Laboratory Press, 1996 p. 469]

DNA: Biomolecules glossary

DNA - RNA - protein: See central dogma

DNA synthesis: DNA replication, the process of making copies of strands of DNA.  Existing DNA is used as a template for synthesizing the new strands. [PhRMA] Related term protein synthesis

ds: Double-stranded (DNA or RNA).

downstream:  Identifies sequences proceeding farther in the direction of expression; for example, the coding region is downstream from the initiation codon, toward the 3' end of an mRNA molecule. Sometimes used to refer to a position within a protein sequence, in which case downstream is toward the carboxyl end which is synthesized after the amino end during translation. [Lemon]

EST Expressed Sequence Tag: Partial gene sequence data of a cDNA clone, which provide a sequence tag for a gene. In order to achieve a very high throughput, these sequences are usually only subjected to a single pass of sequencing so the error rate in these sequences can be high, perhaps approaching 5%. [NCBI]

Developed by Craig Venter and colleagues and further established by the Merck Gene Index. Clones from cDNA libraries are sequenced (single read) from the 3’ end. [R Strausberg et al "The Cancer Genome Anatomy Project" Trends in Genetics 16(3): 103-106 March 2000] 

Often, but not necessarily represent genes, generated through rapid, but error- prone, sequencing methods. [CHI SNPs Update]  Related terms Gene definitions cDNA, transcript clusters; EST maps Maps genomic & genetic

exons: Gene definitions

gene expression: Expression glossary

genetic code: Gene definitions

genomic DNA: Genomics glossary

intron: An intervening section of DNA which occurs almost exclusively within a eukaryotic gene, but which is not translated to amino acid sequences in the gene product. The introns are removed from the pre- mature mRNA through a process called splicing, which leaves the exons untouched, to form an active mRNA. [IUPAC Bioinorganic, IUPAC Compendium]  

A segment of DNA that is transcribed, but removed from within the transcript by splicing together the sequences (exons) on either side of it. [DDBJ/ EMBL/ GenBank Feature Table] http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

Related terms exon, "junk DNA", non- coding, In-depth untranslated regions UTR.

"junk DNA": A general term that encompasses many different types of DNA sequences. These sequences run the gamut from introns, the parts of genes that are edited out during protein synthesis; transposable elements, repeated DNA sequences that, like parasites, duplicate themselves, adding nothing to the genome except more redundant sequence; and pseudo genes, fossils of one- time genes…all of the regulatory elements – promoters and inhibitors - required for gene transcription are spelled out somewhere between the genes. The same is true of other elements deemed junk, such as introns and RNA genes, which clearly hold important clues to understanding alternative splicing … the term junk DNA is frequently used incorrectly. Numerous articles in the medical literature use junk and non- coding DNA interchangeably. [B. Kuska "Bring in Da Noise, Bring in Da Junk" JNCI 90(15): 1125-1127 Aug. 5, 1998]

Dr. Susumu Ohno, writing in the Brookhaven Symposium on Biology in 1972 in the article "So Much ‘Junk DNA" in our Genome’ is credited with originating the term. But his paper was focused "mainly on the fossilized genes, called pseudo genes, that are strewn like tombstones throughout our DNA. But as the term caught on in the 1980’s, its meaning was extended to all non- coding sequences, the vast stretches of DNA that are not genes and do not produce proteins" (about 95% of the genome) … some [scientists] have begun the scrap the notion that all non-coding DNA is junk …  "I don't think people take the term very seriously anymore" says Eric Green [NHGRI] whose group is mapping chromosome 7. [B. Kuska "Should Scientists Scrap the Notion of Junk DNA?" JNCI 90(14): 1032-1033 July 15 1998] 

Narrower terms  intron, non- coding, repetitive sequences.

mRNA messenger RNA: An RNA molecule that transfers the coding information for protein synthesis from the chromosomes to the ribosomes. mRNA is formed from a DNA template by transcription. It may be a copy of a single gene or of several adjacent genes (polycistronic mRNA). On the ribosome, the sequence is converted into the programmed amino acid sequence through translation. [IUPAC Biotech]

Messenger RNA, an intermediate between DNA sequences and the production of protein. The coding strand of DNA is transcribed as an mRNA (complementary to the coding strand), which is then translated by transfer RNA (tRNA) and building- block amino acids to produce a protein. [CHI Breaking Bottlenecks]

Includes 5' untranslated region (5' UTR), coding sequences (CDS, exon) and 3' untranslated region (3' UTR)  [DDBJ/ EMBL/ GenBank Feature Table] http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

Narrower terms antisense RNA, sense RNA, UTR Broader term RNA Related term reverse transcription

methylation: Proteins glossary  

mtDNA: See mitochondrial DNA.

messenger RNA: See mRNA.

mitochondrial DNA: The genetic material of the mitochondria, the organelles that generate energy for the cell. [NHGRI] Related terms mitochondrial genes Gene Definitions; Cell biology glossary mitochondria, organelles

non-coding DNA: Introns, spliced out of the messenger RNA following transcription. [NHLBI] 

Non-coding DNA (also known as selfish, ignorant, parasitic and incidental DNA) includes introns, transposable elements, pseudogenes, repeat elements, satellites, UTRs, hnRNAs, LINEs, SINEs, as well as unidentified junk and makes up approximately 97% of the human genome. Some scientists were so overwhelmed by the amount of non- coding DNA, that they referred to the genome as  “a collection of non- coding regions interrupted by small coding regions.” [Dov. S. Greenbaum "Junk?" Genomics & Bioinformatics MBB 452a, Yale Univ.] http://bioinfo.mbb.yale.edu/mbb452a/projects/Dov-S-Greenbaum.html

Related terms "junk DNA", non-coding regions, repetitive sequences; pseudogenes Gene Definitions Narrower terms In-depth LINEs, non- coding first exons, SINEs, UTRs, others?

non-coding region(s): The part of a gene that does not specify the structure of a protein. Non- coding regions of DNA often contain elements that regulate when a protein will be made, and how much of that protein will be produced [SNP] 

Related terms introns, "junk DNA", repetitive sequences; pseudogenes Gene Definitions

nucleic acids: DNA or RNA Biomolecules

ORESTES open reading frame expressed sequence tags: Approach provides sequence information along the whole length of each transcript, rather than just the ends. The method involves low- stringency PCR to produce cDNA libraries, samples of which are then sequenced. 

Camargo et al ["The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome" PNAS 2001, 98:12103-12108] generated almost 700,000 ORESTES from 24 types of normal or malignant tissue using 3,540 mini- libraries. They predict that their ORESTES dataset may represent as many as 60% of all human genes (including abundant and rare transcripts). The ORESTES approach generates a larger coverage and a greater number of contigs per gene than to standard EST methods, offering the possibility to complete the closure of most sequences using RT-PCR. http://www.biomedcentral.com/news/20011011/01 Related terms EST, ORF

.ORF Open Reading Frame: Corresponds to a stretch of DNA that could potentially be translated into a polypeptide;  i.e., it begins with an ATG "start" codon and terminates with one of the 3 "stop" codons. For an ORF to be considered as a good candidate for coding a bona fide cellular protein, a minimum size requirement is often set, e.g., many of the systematic sequencing groups define an ORF as a stretch of DNA that would code for a protein of 100 amino acids or more. An ORF is not usually considered equivalent to a gene or locus until there has been shown to be a phenotype associated with a mutation in the ORF, and/ or an mRNA transcript or a gene product generated from the ORF's DNA has been detected.  [SGD glossary, Stanford Univ. US] http://genome-www.stanford.edu/Saccharomyces/help/glossary.html#fasta

Sequences of structural genes devoid of termination codons and therefore continuously "readable" by RNA polymerase. [Metathesaurus] 

Reading frames where successive nucleotide triplets can be read as codons specifying amino acids and where the sequence of these triplets is not interrupted by stop codons. [MeSH] 

Broader term reading frame, Narrower term URF Related term: Omes & omics glossary ORFeome

open reading frame: See ORF

pre-mRNA: mRNA See under pre-mRNA splicing

pre-mRNA splicing: One of the steps at which eukaryotic gene expression can be regulated is the processing of mRNA precursors (pre- mRNAs), which includes the removal of intervening sequences (splicing). Regulation at this step is widely used during cell differentiation and development to turn on or off genes or to generate protein variants with different properties from the same primary transcript. [Juan Valcárcel "Research 1996" EMBL Gene Expression] Broader term splicing http://www.embl-heidelberg.de/ExternalInfo/ScientificProgrammes/Valcarcel.html

protein: Proteins glossary

protein coding,  protein coding regions: See coding regions.

protein expression: Expression glossary

protein splicing: Excision of in- frame internal protein sequences (inteins) of a precursor protein, coupled with ligation of the flanking sequences (exteins). Protein splicing is an autocatalytic reaction and results in the production of two proteins from a single primary translation product: the intein and the mature protein. [MeSH]

The excision of an intervening protein sequence (the intein) from a protein precursor and the concomitant ligation of the flanking protein fragments (the exteins) to form a mature extein protein and the free intein (Perler 1994). Protein splicing results in a native peptide bond between the ligated exteins (Cooper 1993). Extein ligation differentiates protein splicing from other forms of autoproteolysis. [InBase (Intein Database), New England Biolabs, 2001] http://www.neb.com/inteins/intein_intro.html

Related terms In-depth exteins, inteins

protein synthesis: See translation.

RNA RiboNucleic Acid: Linear polymer molecules composed of a chain of ribose units linked between positions 3 and 5 by phosphodiester groups to which the bases adenine or guanine or uracil or cytosine, respectively are attached … The three most important types of RNAs in the cell are, c.f. mRNA, tRNA, rRna. [IUPAC Biotech]

A single stranded nucleic acid that contains the sugar ribose. There are several forms of RNA, including messenger RNA (mRNA), transfer RNA (tRNA), and ribosomal RNA [rRNA] (all involved in protein synthesis), as well as several small RNA’s whose functions are still being clarified. Certain viruses have RNA, instead of DNA, as their genetic material. [NIGMS]

A DNA like molecule. Different kinds of RNA exist that play specific roles in the process of gene expression. [NHLBI]

Narrower terms mRNA, In-depth hnRNA precursor RNA, rRNA, scRNA, snRNA, tRNA Related terms ribosomes, ribozymes, RNA polymerase, RNA splicing; Microarrays glossary Northern blotting; Omes & omics glossary ribonome, ribonomics

RNA databases see Databases & software directory.

RNA-RNA interactions: http://www.chem.fsu.edu/faculty/grnbm.htm

RNA silencing: See Functional genomics glossary RNAi

RNA splicing:  The ultimate exclusion of nonsense sequences or intervening sequences (introns) before the final RNA transcript is sent to the cytoplasm. [MeSH] Broader term splicing.

reading frames: The sequence of codons by which translation may occur. A segment of mRNA 5' AUCCGA3' could be translated in three reading frames, 5' AUC.. or 5' UCC.. or 5' CCG.., depending on the location of the start codon. [MeSH] Narrower term ORF Open Reading Frames

reference sequences: Reference sequence standards for the naturally occurring molecules of the central dogma, from chromosomes to mRNAs to proteins. Toward this goal, intermediate larger genomic regions, contigs, are also produced. RefSeq standards provide a foundation for the functional annotation of the human genome. They provide a stable reference point for mutation analysis, gene expression studies, and polymorphism discovery [RefSeq, LocusLink, NCBI, US]  http://www.ncbi.nlm.nih.gov/LocusLink/refseq.html

regulatory sequence: A DNA base sequence that controls gene expression. [DOE]

repetitive sequences: Make up at least 50% of the genome. Repetitive sequences are thought to have no direct functions, but they shed light on chromosome structure and dynamics. They hold important clues about evolutionary events, help chart mutation rates, and by seeding DNA rearrangements, they can modify genes and create new ones. They also serve as tools for genetic studies.

The vast majority of repeated sequences in the human genome are derived from transposable elements - sequences like those that form viral genomes - that propagate by inserting fresh copies of themselves in random places in the genome. A full 45% of the human genome derives from such transposons. A major surprise of this new global analysis of the human genome is that many components in this diverse array of repeated sequences, traditionally considered to be "junk," appear to have played a beneficial role over the course of human evolution. [NHGRI "Summary of the Initial Sequencing and Analysis of the Human Genome" press release, Feb. 11, 2001]  Related term "junk DNA", non- coding DNA.. http://www.nhgri.nih.gov/NEWS/summary_of_sequence.html

reverse transcription: Reverse transcription is used naturally by retroviruses to insert themselves into an organism's genome. Artificially induced reverse transcription is a useful technique for translating unstable mRNA molecules into stable cDNA. [J Buhler, Washington Univ.] http://www.cs.washington.edu/homes/jbuhler/research/array/glossary.html  Related term In-depth reverse transcriptases; Gene definitions cDNA

ribonucleic acid: See RNA.

selfish DNA: See "junk DNA", non- coding DNA.

sequence: The order of neighbouring amino acids in a protein or the purine and pyrimidine bases [A,C,T,G, uracil] in RNA and DNA. [IUPAC Bioinorganic]

Getting more from your sequence on the web EA Greene & S Henikoff, 1997 http://linkage.rockefeller.edu/wli/news/henikoff.html   Automated ways to keep up to date with sequences of particular interest.

Narrower terms  carbohydrate sequence;  Proteins amino acid sequence Related terms Sequencing draft sequence - human, published sequence - human, working draft sequence - human

specific DNA: See under non- specific DNA In-depth glossary

splicing: 1. Of RNA: the procedure by which introns are removed from eukaryotic precursor mRNA molecules and adjacent exon sequences are joined together (spliced). 2. Of DNA: manipulation for joining together double stranded DNA fragments with protruding single stranded "sticky ends" by means of ligases. [IUPAC Biotech, IUPAC Compendium] 

Narrower terms protein splicing, pre- mRNA splicing, RNA splicing, trans- splicing; Gene Definitions  alternative splicing, cDNA; Related terms Cell biology glossary spliceosomes

trans-acting factors:Trans- acting factors functionally have two domains. One domain is required for the factor to bind to DNA, and the second domain is required for the activation of transcription. This was discovered by studying deletion mutants of the factors. Mutants factors were found that could bind DNA but could not activate transcription. Other experiments in which a hybrid protein consisting of the non- DNA binding segment of one trans-acting factor fused to the DNA-binding region of a second trans- acting activated transcription defined the second function of trans- acting factors. [Phil McLean "Control of gene expression in eukaryotes" North Dakota State Univ. 1997] http://www.ndsu.nodak.edu/instruct/mcclean/plsc431/geneexpress/eukaryex6.htm

transcript: Expression glossary  Related terms In-depth 3' UTR, 5' UTR, primary transcript, terminator

transcription: The process by which the genetic information encoded in a linear sequence of nucleotides in one strand of DNA is copied into an exactly complementary sequence of RNA. [IUPAC Biotech]

The transfer of genetic information from DNA to messenger RNA by DNA directed RNA polymerase. It includes reverse transcription and transcription of early and late genes expressed early in an organism’s life cycle or during later development. [MeSH/ Metathesaurus]

The synthesis of an RNA copy from a sequence of DNA (a gene); the first step in gene expression. Compare translation (the process in which the genetic code carried by mRNA directs the synthesis of proteins from amino acids. [DOE]

"Transcription" Central dogma chapter MIT Biology Hypertextbook http://esg-www.mit.edu:8001/esgbio/dogma/trx.html

Related terms translation; In-depth attenuator, reverse transcriptases, transcription machinery; Narrower terms:Gene amplification & PCR reverse transcription; Microarrays In-depth Northern blotting

transcription factors: Endogenous substances, usually proteins, which are effective in the initiation, stimulation, or termination of the genetic transcription process. [MeSH] Narrower term In-depth artificial transcription factors

translation: The unidirectional process that takes place on the ribosomes whereby the genetic information present in an mRNA is converted into a corresponding sequence of amino acids in a protein. [IUPAC Bioinorganic]

The conversion of the genetic instructions for a protein from nucleotides of messenger RNA with amino acids. [NIGMS]

"Translation" Central dogma chapter MIT Biology Hypertextbook  http://esg-www.mit.edu:8001/esgbio/dogma/trl.html

trans-splicing: The joining of RNA from two different genes. One type of trans- splicing is the "spliced leader" type (primarily found in protozoans such as trypanosomes and in lower invertebrates such as nematodes) which results in the addition of a capped, noncoding, spliced leader sequence to the 5' end of mRNAs. Another type of trans- splicing is the "discontinuous group II introns" type (found in plant/ algal chloroplasts and plant mitochondria) which results in the joining of two independently transcribed coding sequences. Both are mechanistically similar to conventional nuclear pre- mRNA cis- splicing. Mammalian cells are also capable of trans- splicing. [MeSH]

transposons: Gene definitions

URF: Unidentified Reading Frame

upstream: Identifies sequences located in a direction opposite to that of expression; for example, the bacterial promoter is upstream of the initiation codon. In an mRNA molecule, upstream means toward the 5' end of the molecule. Occasionally used to refer to a region of a polypeptide chain which is located toward the amino terminus of the molecule. [Lemon] 

Bibliography

DDBJ/ EMBL/ GenBank Feature Table, 2001, 100 + definitions. http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

Alpha glossary index

IUPAC definitions are reprinted with the permission of the International Union of Pure and Applied Chemistry.

In-depth  Sequences, DNA & beyond

3' UTR (three prime): The sequence at the 3' end of messenger RNA that does not code for product. This region contains transcription and translation regulating sequences [MeSH}

Region at the 3' end of a mature transcript (following the stop codon)  that is not translated into a protein. [DDBJ/ EMBL/ GenBank Feature Table]  
http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

A term that identifies one end of a single- stranded nucleic acid molecule. The 3' end is that end of the molecule which terminates in a 3' hydroxyl group. The 3' direction is the direction toward the 3' end. Nucleic acid sequences are written with the 5' end to the left and the 3' end to the right, in reference to the direction of DNA synthesis during replication (from 5' to 3'), RNA synthesis during transcription (from 5' to 3'), and the reading of mRNA sequence (from 5' to 3') during translation. Related term 5' (5-prime) [Mouse Genome Informatics] Broader term UTR

Related terms UTR; Gene amplification & PCR primer extension

5' (5-prime):  The sequence at the 5' end of the messenger RNA that does not code for product. This sequence contains the ribosome binding site and other transcription and translation regulating sequences. [MeSH]

A term that identifies one end of a single-stranded nucleic acid molecule. The 5' end is that end of the molecule which terminates in a 5' phosphate group. The 5' direction is the direction toward the 5' end. Nucleic acid sequences are written with the 5' end to the left and the 3' end to the right, in reference to the direction of DNA synthesis during replication (from 5' to 3'), RNA synthesis during transcription (from 5' to 3'), and the reading of mRNA sequence (from 5' to 3') during translation. . Related term 3' (3-prime).  [Mouse Genome Informatics] 

5' UTR (five prime): Region at the 5' end of a mature transcript (preceding the initiation codon) that is not translated into a protein. [DDBJ/ EMBL/ GenBank Feature Table]   http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

5' Untranslated Region:. That portion of an mRNA from the 5' end to the position of the first codon used in translation. Related term  3'UTR. [Mouse Genome Informatics]

Related term 3' prime; Gene amplification glossary primer extension Broader term UTR.

amino acid sequence:  The order of amino acids as they occur in a polypeptide chain. This is referred to as the primary structure of proteins. It is of fundamental importance in determining protein conformation. [MeSH]

artificial transcription factors: Regulated gene expression is critical for cellular existence, and a disruption in the regulatory network can result in disease or death. Therefore, a goal of primary importance in the scientific community has been to discover methods of reprogramming gene expression in diseased cells while leaving normal cells unaffected. Our understanding of transcription, an early step in gene expression, has now reached a sufficiently sophisticated level to allow us to tackle this challenge from a chemical perspective. Dendritic and polymeric structures designed to functionally mimic the protein participants in activation and repression of transcription will be examined through in vitro assays and cell culture experiments. Organic synthesis will play a critical role in this effort. By varying the synthetic approaches to the artificial transcription factors, their overall function as activators and/or repressors can be controlled and important characteristics such as cell membrane permeability and tissue- type specificity can be addressed. [Anna K. Mapp "Chemistry at the Univ. of Michigan, 2001] http://www.umich.edu/~michchem/faculty/mapp/

attenuator: In prokaryotes. 1) region of DNA at which regulation of termination of  transcription occurs, which controls the expression of some bacterial operons;  2) sequence segment located between the promoter and the first structural gene that causes partial termination of transcription. [DDBJ/ EMBL/ GenBank Feature Table]  http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

CpG islands: Regions of DNA rich in CpG dinucleotides, also known as CpG islands, are often located upstream of the transcription start site in both tissue specific and housekeeping genes.  Overall, CpG dinucleotides are observed at a density of 25% the expected level from base composition alone, partially due to 5- methylcytosine decay (Bird, 1993). Since CpG dinucleotides typically occur with low frequency, CpG islands can be distinguished statistically in the genome. [Eric C. Rouchka et. Al "Computational Detection of CpG Islands in DNA" Sept. 1997 Washington Univ. St. Louis, US]  http://stateslab.bioinformatics.med.umich.edu/~ecr/PAPERS/WUCS-97-39.pdf

catalytic RNA: Ever since HHMI investigator Thomas Cech at the University of Colorado in Boulder uncovered the catalytic properties of RNA in 1982, researchers have been diligently studying these ribozymes. Scientists have since discovered more than 500 ribozymes in a diverse range of organisms and have found that they share many similarities with their more widespread protein cousins, enzymes. [Howard Hughes Medical Institute News, Oct. 9, 1998]http://www.hhmi.org/news/ribozyme.html

 Related term ribozymes

central dogma exceptions ("busters"): 1. Reverse transcriptase and RNA genomes. DNA is not the only molecule of heredity in nature and, as David Baltimore and Howard Temin showed, the flow of information from DNA to RNA is not the only pathway possible. 2. Catalytic RNAs (ribozymes). Proteins are not the only structures capable of catalyzing a reaction. Tom Cech demonstrated the catalytic nature of certain classes of introns (intervening sequences) that are able to "self-splice." In addition Harry Noller has shown that the synthesis of the peptide bond during protein synthesis is catalyzed by the 23S rRNA of the ribosome. 3. Heritable proteins. Stanley Prusiner has given us the novel name "prion" (proteinaceous infections particle) to describe the agent responsible for a number of slow, neurological infectious disease, including scrapie, bovine spongeform encepalopathy (mad cow disease) and Creutzfeld- Jakob disease. [Martinez Hewlett, Molecular Biology 411, Univ. of Arizona, Tucson US] http://www.blc.arizona.edu/marty/411/Modules/mod4.html

cis-acting sequences: The sequences just 5' of the start site of transcription are the most important for the initiation of transcription. This is where the transcription complex is built. In general, this region is called the promoter. For eukaryotes, several sequences same to be conserved among many genes. One such sequences is the TATA box. The sequence is located about 30 bases upstream (-30) from the transcription start site and is the one sequence required for any significant transcription to occur. Other sequences add in transcription but are not always part of promoter. The two most found are the CCAAT box (called the CAT box) and the GC box. Because mutants of these three sequences only express mRNAs at low levels, these are considered the most important sequences of the basic transcription complex. [Phillip McClean, "Control of gene expression in eukaryotes, North Dakota State Univ. 1997]  http://www.ndsu.nodak.edu/instruct/mcclean/plsc431/geneexpress/eukaryex3.htm

cis-splicing: Splicing of messenger RNA precursors (pre- mRNAs) is a requisite step in the generation of virtually all mature mRNAs. This process requires the coordinated interaction of several small nuclear ribonucleoprotein particles (snRNPs) and many protein factors that assemble to form an enzymatic complex known as the spliceosome. Components of the spliceosome recognize the exon- intron boundaries at the 5' and 3' splice sites, excise the intron and ligate the adjoining exons. In most cases, splicing joins a 5' splice site and a 3' splice site within the same pre-mRNA molecule, termed cis-splicing. [Intronn, Inc.  "Background cis- and trans- splicing" 1999] http://www.intronn.com/r&t/background.htm

cis-trans: Gene definitions

enhancer: A cis- acting sequence that increases the utilization of (some)  eukaryotic promoters, and can function in either orientation and in any location (upstream or downstream) relative to the promoter. Eukaryotes and eukaryotic viruses. [DDBJ/ EMBL/ GenBank Feature Table] http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html Related term promoter. 

exteins:  Flanking protein fragments. [InBase, New England Biolabs, 1999] http://www.neb.com/inteins/int_id.htm  Related terms inteins, protein splicing.

hnRNA heteronuclear RNA: RNA transcripts in the nucleus, representing precursors and processing intermediates of rRNA, mRNA, and tRNA, as well as mature RNA transcripts not yet transported into the cytoplasm. [ http://newfish.mbl.edu/Course/Glossary/

inteins: A large in-frame insertion in a sequenced gene that is absent in other sequenced homologs suggests that this gene may contain an intein. These intervening sequences are often found by running any of the commonly available sequence comparison programs such as Bestfit, Gap or Blast. Significant Blast matches are often found to the extein protein AND one or more proteins containing similar inteins. More sophisticated searches can be performed using intein motifs (Pietrokovski 1994, Perler 1997 and Pietrokovski 1998A) or a Hidden Markov Model (Dalgaard 1997 and Gorbalenya 1998). The presence of an intein in a particular gene does not necessarily mean that a homolog from a closely related species or strain will have the same intein. ... Many inteins are bifunctional proteins with splicing and endonuclease activity. [InBase, New England Biolabs, 1999] http://www.neb.com/inteins/int_id.htm 

Inteins are parts of proteins that cut themselves out of the whole protein entirely on their own accord. This phenomenon has become known only in the past few years, and it is perplexing because most major alterations to a protein require a second protein, such as a protease, and other cofactors, such as energy in the form of ATP. Self- splicing proteins, therefore, represent a fundamentally new way of protein modification, says [Henry] Paulus, who works at the Boston Biomedical Research Institute. [Harvard Medical School, Focus, Oct. 31, 1997]  http://www.med.harvard.edu/publications/Focus/1997/Oct31_1997/biochem.html

Internal protein sequences. 

Related terms exteins, protein splicing.

intron splicing:

LINEs Long Interspersed Nuclear Elements or Long INterspersed Elements: Families of long (average length = 6 500 bp), moderately repetitive (about 10,000 copies). LINEs are cDNA copies of functional genes present in the same genome; also known as processed pseudo- genes. [FAO Glossary] Related terms non-coding, retrotransposons. 

LTR Long Terminal Repeat: A sequence directly repeated at both ends of a defined sequence, of the sort typically found in retroviruses. [DDBJ/ EMBL/ GenBank Feature Table] http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

non-coding first exons: Although conventional programs detect many parts of genes with ease, they fail when it comes to detecting two important elements- the very first pieces of genes, and the nearby "on" switches of genes called promoters. Researchers in the bioinformatics group at Cold Spring Harbor Laboratory have now developed a computer program that is especially good at finding these first segments and "on" switches of genes. "FirstEF is the first program that can readily and accurately detect a class of gene segments that has previously been extraordinarily difficult to find," says [Michael] Zhang. Instead, FirstEF recognizes five other DNA "signatures" that betray the presence and location of first exons in genes. The biological basis of some of these telltale genetic signatures is unknown ... One such signature is the frequency with which two building blocks of DNA, C and G, occur next to each other.   [Cold Spring Harbor Laboratory, US]. "It's like looking for buried treasure."

The gene segments Zhang is referring to occur at the very beginning of genes, and are called "non- coding first exons." Because they do not encode protein segments, non- coding first exons are undetectable by conventional computer programs that rely on protein coding patterns found in DNA. [Cold Spring Harbor Lab press release Nov. 28, 2001] http://www.cshl.org/public/releases/zhang112801.html

 Related terms exons, non- coding

non-specific DNA: A new discovery about how cells regulate protein synthesis helps explain the complex interactions between proteins and DNA and may have far reaching implications for future biotechnology research. In order to inhibit gene expression, proteins need to bind to specific DNA target sites, which are often located in stretches of non- specific DNA. The mechanism for recognition and discrimination between non- specific and specific sites has remained a mystery. Researchers at the Institute of Molecular Biology, University of Oregon, used a new imaging technique called scanning force microscopy (SFM) to visualize DNA and protein complexes in the process of binding.. SFM fills a need for quantitative analysis of DNA not possible with x-ray crystallography. SFM provides a topographic image of a molecular surface by scanning a surface underneath a tip modified with an electron beam. Deflections sensed by the tip can be amplified and recorded, providing a quantitative topographic map of the surface. Previous studies have shown that recognition of a specific target site is often accompanied by DNA "bending." However, the significance of this bending has not been understood. SFM studies revealed crucial differences in DNA bending induced by protein binding to non- specific and specific sites. [Sean Henahan "DNA bends to bind", Access Excellence, Mar. 2001] http://www.accessexcellence.org/AB/BC/DNA_Bends_to_Bind.html

precursor_RNA: Any RNA species that is not yet the mature RNA product;  may include 5' clipped region (5' clip), 5' untranslated region (5' UTR), coding sequences (CDS, exon), intervening sequences (intron), 3' untranslated region (3' UTR), and 3' clipped region (3' clip). ... used for RNA which may be the result of post- transcriptional processing. [DDBJ/ EMBL/ GenBank Feature Table] http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

primary (initial, unprocessed) transcript: Includes 5' clipped region (5' clip), 5' untranslated region (5' UTR), coding sequences (CDS, exon), intervening sequences (intron), 3' untranslated region (3' UTR), and 3' clipped region (3' clip). [DDBJ/ EMBL/ GenBank Feature Table] http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

promiscuous DNA: The occurrence of identical base sequences in more than one cellular compartment. Evidence for gene flow between organelles, or organelles and the nucleus. [PJ Bottino Biology 222 Univ. Maryland Fall 1996]   http://www.life.umd.edu/classroom/biol222/lect33-37.html

promoter: Region on a DNA molecule involved in RNA polymerase binding to initiate transcription.  [DDBJ/ EMBL/ GenBank Feature Table] http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html Related terms cis- acting, enhancer, promoter regions

promoter regions:  The DNA region, usually upstream to the coding sequence of a gene or operon, which binds and directs RNA polymerase to the correct transcriptional start site and thus permits the initiation of transcription. [IUPAC Biotech]

DNA sequences which are recognized (directly or indirectly) and bound by a DNA- dependent RNA polymerase during the initiation of transcription. Highly conserved sequences within the promoter include the Pribnow box in bacteria and the TATA BOX in eukaryotes. [MeSH] Related term enhancer.

rRNA: Ribosomal RNA, RNA molecules which are essential structural and functional components of ribosomes, the subcellular units responsible for protein synthesis. [IUPAC Biotech]

Mature ribosomal RNA ; the RNA component of the ribonucleoprotein particle (ribosome) which assembles amino acids into proteins.  [DDBJ/ EMBL/ GenBank Feature Table] http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

RNA polymerase RNAP: The movement of RNA polymerase (RNAP) along DNA during transcription is a complex set of different activities, including initiation, elongation, pausing, backtracking, and arrest. A complete understanding of how this molecular machinery works requires characterization of the individual activities, when and why they occur, what structural components are required in each case, and what the biochemical parameters are. Since ensemble measurements will give only averages across a mixture of molecules engaged in a variety of these different behaviors, single molecule measurements may be the only way to examine the characteristics of each type of behavior independently [NIGMS  "Single Molecule Detection and Manipulation Workshop"Single Molecule Fluorescence of Biomolecules and Complexes Protein Folding April 17-18, 2000] http://www.nigms.nih.gov/news/reports/single_molecules.html#examples 

reverse transcriptases: Gene amplification & PCR

ribosomal RNA: See rRNA.

ribosomes: Cell Biology

ribozymes: Naturally occurring RNAs with enzymatic activity that specifically bind to and cleave-  and therefore inactivate- mRNA molecules. Like the antisense approach, ribozymes provide a means of inhibiting a gene of interest for target validation studies. [CHI Breaking Bottlenecks]

Ribozymes can be engineered to bind naturally to any RNA sequence, resulting in the cleavage and inactivation of mRNAs containing the target sequence. [CHI Target Validation] 

Related term: catalytic RNA

scRNA: Small cytoplasmic RNA; any one of several small cytoplasmic RNA molecules present in the cytoplasm and (sometimes) nucleus of a eukaryote.  [DDBJ/ EMBL/ GenBank Feature Table] http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

SINEs Short Interspersed Nuclear Elements or Short INterspersed Elements: Short interspersed nuclear elements. Families of short (150 to 300 bp), moderately repetitive elements of eukaryotes, occurring about 100,000 times in a genome. SINES appear to be DNA copies of certain tRNA molecules, created presumably by the unintended action of reverse transcriptase during retroviral infection. [FAO Glossary] Related terms non- coding, retrotransposons.

sequence data, molecular:  Descriptions of specific amino acid, carbohydrate or nucleotide sequences which have appeared in the published literature an/or are deposited in and maintained by databanks such as GenBank, EMBL, NBRF or other sequence repositories [databases] [MeSH]

small cytoplasmic RNA: See scRNA.

small nuclear RNA: See snRNA.

snRNA: Small nuclear RNA; any one of many small RNA species confined to the nucleus; several of the snRNAs are involved in splicing or other RNA processing reactions. [DDBJ/ EMBL/ GenBank Feature Table]  http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

splice sites: Boundaries between exons and intron, there are two varieties: the border going from exon to intron is called a donor site or a site, the border separating intron from exon is called an acceptor site or a site. [TP Speed, S. Cawley, "Locating splice sites"  Statistics 260 Statistics in Genetics, Univ. of California- Berkeley, 1998]  http://www.stat.berkeley.edu/users/terry/Classes/s260.1998/Week12/week12/node14.html

splice junctions: 

start codon: 

stop codon:

sticky ends: The staggered ends of complementary sequences of DNA which result from cleavage by restriction enzymes. [IUPAC Biotech]

tRNA: See transfer RNA.

template: Gene amplification & PCR

terminator: A sequence of DNA lying beyond the 3’ end of the coding segment of a gene which is recognized by RNA polymerase as a signal to stop synthesizing mRNA. [IUPAC Biotech]

Sequence of DNA located either at the end of the transcript  that causes RNA polymerase to terminate transcription  [DDBJ/ EMBL/ GenBank Feature Table]   http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

transcription machinery: Consists of the RNA polymerase II holoenzyme plus two additional “general transcriptions factors,” which are protein complexes and a histone acetyltransferase that theoretically exerts its transcriptional activity by modifying chromatin. ... [Several] studies provide evidence for the function of components of the general transcription machinery, in terms of their role in regulation of the transcription of specific sets of genes … Apparently, the specific transcription regulatory activities of components of the general transcription machinery provide a layer of regulation in addition to that provided by the gene- specific regulators … Knowledge gained concerning the coordinate regulation of genes, how gene- specific transcription factors (which are the targets of many existing drugs (e.g., steroids, selective estrogen response modifiers, thiazolidinediones) interact with general transcription factors, and how signal transduction pathways regulate gene transcription is expected to be important for genomics based identification of targets that are components of transcriptional regulation and signal transduction networks.  [CHI Functional Genomics]

transfer RNA:  A single-stranded RNA molecule containing about 70-90 nucleotides, folded by intrastand base pairing into a characteristic secondary (“cloverleaf”) structure that carries a specific amino acid and matches it to its corresponding codon on an mRNA during protein synthesis. [IUPAC Biotech]

Mature transfer RNA, a small RNA molecule (75 - 85 bases long) that mediates the translation of a nucleic acid sequence into an amino acid sequence [DDBJ/ EMBL/ GenBank Feature Table]   http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

trans- splicing: Splicing between two independently transcribed pre- mRNAs is termed trans-splicing. This has been described in trypanosomes, nematodes, flatworms, and plant mitochondria. In vitro trans-splicing has been used as a model system to examine the mechanism of splicing. Trans-splicing of pre-mRNAs in human cells has been postulated to account for some rare events.  [Intronn, Inc.  "Background cis- and trans- splicing" 1999] http://www.intronn.com/r&t/background.htm

UTR: The parts of the messenger RNA sequence that do not code for product, i.e. the 5' UNTRANSLATED REGIONS and 3' UNTRANSLATED REGIONS. [MeSH]

UnTranslated Region: Critical for many aspects of gene regulation and expression. Narrower terms 3' UTR, 5' UTR. Related term intron


Cambridge
Healthtech Institute
1037 Chestnut Street
Newton Upper Falls, Ma 02464
Phone:
617-630-1300
Fax:  617-630-1325
Email: chi@healthtech.com