| You are here Glossary
homepage/Search > Biology > Gene definitions Gene definitions Evolving terminology for emerging
technologiesComments? Suggestions? mchitty@healthtech.com Last revised December 27, 2001 One of the most unfortunate legacies of Mendelian genetics is
the lumping together of gene defects and genes. People with various
genetic defects may or may not manifest a disease phenotype. As both
Horace Freeland Judson and Sydney Brenner point out in the articles cited
below classical genetics was so firmly based on gene defects that only
recently have we begun to determine what "normal" or wild- type genes really
are. And careful reading and/or listening will often reveal that people
use the word gene and a number of related words and phrases (mutations
and other variants) very loosely and interchangeably.
Related glossaries include Applications Genomics,
Sequencing Informatics Molecular
modeling Biology: Expression,
Genetic variations, and particularly
Sequences,
DNA & beyond. Proteomics is also key, since it is the gene's protein products which are ultimately
of interest. Additional definitions for categories of genes appear in the In-depth glossary,
after the Bibliography. How past history leads to present confusion Horace Freeland Judson, writing in the Feb. 2001 human genome issue of Nature
notes problems with terminology. "The phrases current in genetics that
most plainly do violence to understanding begin "the gene for":
the gene for breast cancer, the gene for hypercholesterolaemia, the gene
for schizophrenia, the gene for homosexuality, and so on. We know of course
that there are no single genes for such things. We need to revive and put
into public use the term "allele". Thus, "the gene for breast cancer"
is rather the allele, the gene defect - one of several - that increases
the odds that a woman will get breast cancer. "The gene for" does, of course,
have a real meaning: the enzyme or control element that the unmutated gene,
the wild- type allele, specifies. But often, as yet, we do not know what
the normal gene is for. ... Pleiotropy. Polygeny. Perhaps these terms will not easily become
common parlance; but the critical point never to omit is that genes act in concert with one
another - collectively with the environment. Again, all this has long been understood by biologists,
when they break free of habitual careless words. We will not abandon the reductionist
Mendelian programme for a hand- wringing holism: we cannot abandon the term gene and its allies.
On the contrary, for ourselves, for the general public, what we require
is to get more fully and precisely into the proper language of genetics."
[Horace Freeland Judson "Talking about the genome" Nature 409: 769, 15
Feb. 2001] Sydney Brenner, writing in the special Drosophila genome issue of Science
made a similar observation "Old geneticists knew what they were talking
about when they used the term "gene", but it seems to have become
corrupted by modern genomics to mean any piece of expressed sequence, just
as the term algorithm has become corrupted in much the same way to mean
any piece of a computer program. I suggest that we now use the term "genetic
locus" to mean the stretch of DNA that is characterized either by mapped
mutations as in the old genetics or by finding a complete open reading
frame as in the new genomics. In higher organisms, we often find closely
related genes that subserve closely related, but subtly different, functions."
[Sydney Brenner "The End of the beginning" Science 287 (5451): 2173, Mar.
24, 2000] Don't expect to know anytime soon exactly
how many human genes there are. The yeast (Saccharomyces cerevisiae)
genome has been sequenced since 1996 and the precise number if not yet agreed
upon. About 60% of our genes
exhibit alternative splicing, making the number of protein products close to
100,000, not a very different number from the more recent estimates. Expect to hear more about genes and
the cell cycle, and how gene expression differs throughout it. Definitions of gene Gene is a good example of
a word in the process of evolving from classical genetics meanings (fairly abstract
concepts, rooted in the Mendelian model of monogenic
diseases with high penetrance). The concept of "gene" has been changing so fast that most print
resources (and some online) are out of date. The best source I've found is at http://www.ergito.com/ a
project of Benjamin Lewin and colleagues
(requires free registration) Molecular Biology: The best- selling textbook GENES
online (which also has an extensive glossary). The definition of gene is evolving
(and lengthening) as we tease apart the incredible complexity of biological and
molecular processes and discover that "junk DNA" has important
regulatory functions. Gene identification in prokaryotes is almost trivial as their genomes consist almost entirely
of exons. However human genes are only about 2 % of total human DNA.
Human exons are widely separated by immense stretches of introns. The concept of "gene" didn't come along until 1909, three years after
the term genetics in 1906 (Evelyn Fox Keller, The century of the gene,
Harvard University Press, 2000). For some time it remained a quite abstract
term. With advances in molecular biology the definition is less and
less abstract - but far from settled.
Is a monolithic gene concept still valid? Bioinformatics expert Nat Goodman writing in the April 2001 issue of Genome
Technology states that gene "is a highly nuanced noun like "truth". Ten
years ago, it commonly meant "genetic locus" - a region of the genome
linked to a disease or other phenotype. Over time biologists became more
comfortable thinking of a gene as a transcribed region of the genome that
results in functional molecular product. In its published human genome
paper [Science Feb. 16, 2001] Celera defines a gene as "a locus of
cotranscribed exons" in order to emphasize the importance of alternative splicing. Ensembl's Gene Sweepstake Web page [see below] took the definition
to new depth: "A gene is a set of connected transcripts. ... Two
transcripts are connected if they share at least part of one exon in the genomic
coordinates. Implicit in the new definitions of a gene is a belief that
the genome can be partitioned into regions such that all exons in a given region
belong to a single gene. These regions are the loci of Celera's
definition. A theoretically possible alternative is that the genome might
contain long chains of overlapping transcripts in which the first transcript
overlaps the second which overlaps the third, but the first and third don't
overlap. I'm not aware of any such examples, but if they exist, then all bets
are off." [Nat Goodman "Human Transcriptome Project" Genome Technology:
55-58 April 2001] While some of the terms included below are relevant to all genes, some are specific to humans
and/ or other higher organisms. Gene definitions gene: (cistron) Structurally, a basic unit of hereditary material;
an ordered sequence of nucleotide bases that encodes one polypeptide chain
(via mRNA). The gene includes, however, regions preceding and following
the coding region (leader and trailer) as well as (in eukaryotes) intervening
sequences (introns) between individual coding segments (exons). Functionally,
the gene is defined by the cis-trans test that determines whether
independent mutations of the same phenotype occur within a single gene
or in several genes involved in the same function. [IUPAC Compendium] The functional and
physical unit of heredity passed from parent to offspring. Genes are pieces of
DNA, and most genes contain the information for making a specific protein [NHGRI
glossary] This definition doesn't specify that it applies only to humans - but
by specifying "parents" it seems to rule out non- animal genes, and
almost implies mammals, or at least warm- blooded organisms. A gene is a DNA segment that contributes to phenotype/
function. In the
absence of demonstrated function a gene may be characterized by sequence, transcription
or
homology. [HUGO, J.A. White et. al. Guidelines for Human Gene Nomenclature HGNC
Human Genome Nomenclature Committee,
1997] http://www.gene.ucl.ac.uk/nomenclature/guidelines.html#2.2 From Genesweep, Ensembl, European Bioinformatics Institute, UK http://www.ensembl.org/genesweep.html
At the 2000 Cold Spring Harbor Genome conference [May 10-14] “one of
the hotly debated topics was the number of human genes. This has been estimated
at anything from 35,000 to 150,000. Considering the spread of opinion,
the only way to resolve was to get people to bet on it … This led to an
interesting debate on the definition of a gene … and how to assess that
number.” A gene is a set of connected transcripts. A transcript is a set
of exons via transcription followed (optionally) by pre- mRNA splicing.
Two transcripts are connected if they share at least part of one exon in
the genomic coordinates. At least one transcript must be expressed outside
of the nucleus and one transcript must encode a protein (see Footnotes). Assessment of the method used to determine the gene will occur by voting
at Cold Spring Harbor Genome Meeting 2002 Footnotes - We are restricting ourselves to protein coding genes to allow an
effective assessment. RNA genes were considered too difficult to
assess by 2003.
- The key definition in the gene is that alternatively spliced transcripts
all belong to the same gene, even if the proteins that are produced are
different.
- The hope is that by 2003 we should have at least a hard floor to the gene
numbers. The voting should be able to determine the best method. [The cost
of betting goes up over the years because people will have more information.]
- The scope of the genome are the autosomal chromosomes and X and Y. No
epigenetic
nor mitochondrial genes are counted.
- Encoding a protein assumes that the translation machinery does translate
the sequence at some time. The scope of the expression of genes is across
all cell types and all developmental stages (obviously!).
- The genome is defined as the reference sequence (hence a mosaic of haplotypes)
as defined by Greg Schuler, NCBI.
- Somatic recombinant loci are counted after recombination: i.e.,
Ig [immunoglobulin] and TCR [T cell receptor] loci will form one gene per locus.
- Transcripts from repetitive regions are not counted even if expressed.
A repetitive region is an element which is both repeated in the genome
and has good evidence that the method of replication is based on a selfish
replication strategy.
- If trans- splicing is found in humans (which it has not been so far, and
is unlikely to occur. But just in case) the definition of the transcript
occurs after the trans splicing event. This will split trans- spliced, polycistronic transcripts into multiple genes by this definition.
Does defining “gene” only get harder? Or are we making progress
by recognizing how complicated it really is? This is not a new problem. The report of the Invitational DOE Workshop
on Genome Informatics (26-27 April 1993, Baltimore MD) pointed out "The
concept of “gene” is perhaps even more resistant to unambiguous definition
now than before the advent of molecular biology. Our inability to produce
a single definition for “gene” has no adverse effect upon bench research,
[is this true?] but it poses real challenges for the development of federated
genome databases.
http://www.ornl.gov/hgmis/publicat/miscpubs/bioinfo/inf_rep2.html A tutorial "Ontologies for Molecular Biology Workshop: Semantic Foundations
for Molecular Biologies" at the Intelligent Systems for Molecular Biology
Conference ( June 27-28, 1998) in Montreal, Canada noted "Molecular
biology has a communication problem. Many researchers and databases
use (at least partially) idiosyncratic terms and concepts for representing
biological information. Often, terms and definitions differ between groups,
with different groups not infrequently using identical terms with different
meanings. The concept ‘gene’, for example, is used with different
semantics by the major international genomic databases. http://www-lbit.iro.umontreal.ca/ISMB98/anglais/ontology.html An account of a Gene Nomenclature workshop held in
conjunction with the annual American Society of Human Genetics meeting in
Philadelphia PA, US Oct. 2 2000 reported on discussion between the human and
mouse nomenclature committees (and other interested parties): "A
gene can be defined as an abstraction that is useful for the purposes of nomenclature and for the assignment of a symbol. It was originally described as
a "unit of inheritance" and has since been described a "set of
features on the genome that can produce a functional unit", but this latter
term does not encompass all of those objects to which symbols are assigned.
Designations in MGD [Jackson Lab's Mouse Genome Database] specify whether each object is a
marker, gene, D segment
etc., so in this context the actual definition of a gene is not so important. The GeneSweep definition is
not particularly useful for nomenclature as it indicates all genes must code for
a protein, and hence does not include mRNAs etc. It was agreed that the term
"gene" has been used for a collection of object types and should not
be removed as it is still a very useful term, particularly for the clinician and
for those with a clearly defined locus of interest; however, perhaps it is not
so useful for nomenclature, and the term "genomic feature"
should be used instead. Possible definitions of genomic features were discussed,
including an object which shares exons, that are assumed to be transcripts from
the same gene. Another suggestion was that the term "symbol" should be
defined, rather than "gene", as this is what nomenclature committees
work with, and it can incorporate a number of variations on the term
"gene". [HM Wain et. al "Report of ASHG- NW Gene Nomenclature
Workshop", HUGO, Jan. 2001] http://www.gene.ucl.ac.uk/nomenclature/ashgnw_report.html
See also the first entry under gene family. Parts of genes and gene processes constitute the rest of this
section. Broader term: DNA Sequences
DNA & beyond Categories of genes appear in the In-depth
glossary Nomenclature and terminology promise to continue
to be ongoing challenges as comparative genomics matures. Gene structure, parts of genes
(and potential genes) and gene processes: alternative splicing: A single gene can contain numerous exons and
introns, and the exons can be spliced together in different ways. For example,
if a gene contains 10 exons, one version of the mRNA transcribed from that gene
might contain exons 1-9. Another version of the mRNA might contain exons 1-8,
and exon 10. This is called alternative splicing, and can produce
different forms of a protein from the same gene. [NCBI, US, MLA CE Course:
Genetics Review, 2001] http://www.ncbi.nlm.nih.gov/Class/MLACourse/Genetics/gene.html The production of two or more distinct mRNAs from RNA transcripts having the same sequence via differences in
splicing (by the choice of different exons).
[Mouse Genome Informatics] Different ways of combining a gene's exons to make variants of the complete
protein [ORNL] Broader term splicing Related termsalternative splice sites; Sequences,
DNA & beyond pre- mRNA splicing,
protein splicing, RNA splicing, trans- splicing alternative transcript: Expression, genes &
beyond cDNA complementary DNA: A single stranded DNA molecule with a
nucleotide sequence that is complementary to an RNA molecule; cDNA is formed
by the action of the enzyme reverse transcriptase on an RNA template. After
conversion to the double stranded form, cDNA is used for molecular cloning
or for hybridization studies. [IUPAC Biotech] A complementary DNA for a messenger RNA molecule. Unlike an mRNA, a
cDNA can be easily propagated and sequenced. [NCBI] cDNA databases consist of two major types of sequences: expressed sequence tags (ESTs) and "proper" cDNAs. ESTs are short
segments of genes generated through rapid, but error-prone, sequencing methods.
"Proper" cDNAs are long segments of genes, often full-length, that are
obtained through more careful sequencing methods. (Note that although we have
chosen to describe these as "proper" cDNAs, they are often referred to
as "full-length" cDNAs or "long" cDNAs. But these terms are
inexact.) [CHI Bioinformatics] Logic of Molecular Approaches to Biological Problems, John
Wagner (Cornell
Univ. Graduate School of Medical Science, US ) has an extensive and articulate
section on the use of cDNA in experimental design. http://www-users.med.cornell.edu/~jawagne/cDNA_cloning.html Narrower term cDNA
maps Related terms EST (expressed
sequence tag), transcript clusters; gene expression. Compare genomic DNA Genomics cDNA databases see Databases & software
directory cDNA maps: Maps genomic & genetic
glossary central dogma: Sequences DNA & beyond chromosome: Cell biology glossary cis-: This side of; compare with trans-,
meaning across. cis trans test: Molecular Biology The complementation
test with two or more interacting genes placed in cis and in trans relationships
to each other. A double mutant genome is used in the cis test made from
the two single mutant genomes used in the trans test by recombination. If the wild type phenotype is restored by both cis and trans arrangements
it is concluded that the two mutations are in different genes and hence
that the phenotype is determined by more than one gene. If the trans test
is negative and the cis positive this means that the two mutations are
in the same gene. If both tests are negative then at least one of the mutations
must be dominant. Thus the double test provides a means of fine mapping
of genes. A lab test which is used to determine whether two mutations of different
genes which affect the same phenotype are on the same functional unit (indicating
a cis configuration of the mutated genes) or on different functional units
(indicating a trans configuration of the mutated genes). (A functional
unit can be a chromosome.) cistron: HF Judson in the Eighth Day of Creation tells how Seymour
Benzer "wanted to scrap the word "gene" and replace it with three
new terms, "muton" for the smallest spot at which mutation could take
place, "recon' for the irreducibly shortest length on the map that could
not be split by a genetic recombination even at the fine scale he had reached,
and "cistron" for the shortest stretch that comprised a functional
genetic unit. (The last was derived from the mating tactic Benzer used to
determine which mutations lay near each other on the map, which was technically
called the "cis- trans test"... Over the next decade, Benzer's new
terms came into a considerable vogue, especially "cistron". But the
other two were superfluous once mutations and recombinations could be thought of
in terms of base pairs, while the cistron was, in effect, the gene in its
principal sense; it is the older usage that has lasted and the newer one that
has died away. [Horace F Judson Eighth Day of Creation, Cold Spring Harbor
Laboratory Press, 1996 pp. 320-321] Genetics Term coined by Seymour Benzer in 1955 referring
to DNA coding for a single polypeptide. Originally did not include the
start and stop codons. Polycistronic implies coding for two or more
proteins. coding
regions: The part of a gene that specifies the structure
of a protein. [SNP Consortium] Also referred to as a "coding sequence" or
protein
coding region or sequence. Narrower terms mature peptide or protein
coding sequence, signal peptide coding sequence, transit peptide coding sequence coding sequence CDS: Sequence of nucleotides that corresponds
with the sequence of amino acids in a protein (location includes stop codon).
Feature includes amino acid conceptual translation. [DDBJ/ EMBL/ GenBank
Feature Table] http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
Related term coding regions, In-depth mature peptide or protein coding
sequence. complementary DNA: See cDNA. EST Expressed sequence tag: Sequences,
DNA & beyond epigenetic: Descriptive term for processes that change the phenotype
without altering the genotype. [IUPAC Biotech] epigenetics: Deals with changes in gene expression, particularly gene silencing, that are brought about by potentially reversible changes in DNA
methylation or chromatin
structure. A simple way to think about epigenetics is that it comprises
the "gray" aspects of genetics, i.e., the genes involved do not always
conform to the black and white Mendelian laws of inheritance. Examples
include genes that are expressed only when they are inherited from either
the male or the female parent ("parental imprinting"), genes that are continually
silenced for one or more generations (paramutation), and genes that exhibit
continuous variation in expression levels (variable expressivity). [Gordon
Research Conference Epigenetic effects on gene expression, 1997] http://www.reeusda.gov/crgam/nri/pubs/archive/abstracts/abstract97/plgenmec.htm "The meaning of Epigenetics" Joshua Lederberg,
Scientist 15 (18): 6, Sept. 17, 2001 http://www.the-scientist.com/yr2001/sep/comm1_010917.htm
Etymology and clarification. exons: A section of DNA which carries the coding sequence for
a protein or part of it. Exons are separated by intervening, non- coding sequences (introns). In eukaryotes most genes consist of a number
of exons. [IUPAC Bioinorganic] Exons contain the coding sequences of a gene -
in contrast to introns, or "junk DNA," which are excised before mRNA
is translated into a protein. The portion of the genome that is expressed as a processed mRNA. [NHLBI] The term "exon" is normally applied for regions which are not spliced
out from a pre- mRNA sequence (5' untranslated region (5' UTR), coding sequences
(CDS) and 3' untranslated region (3' UTR)). But this term is often used
also to indicate the protein- coding regions only. [“Gene Structure Prediction”
HGMP training course notes, Luciano Milanesi, 1998] http://www.hgmp.mrc.ac.uk/Courses/GeneProteinID/milanesi/milanesi.htm Narrower term: Sequences, DNA
& beyond glossary non- coding first exons expressed sequence: See coding sequence (coding regions) gene coding: See coding regions, coding sequences. gene discovery methods: Include Genetic
variations glossary candidate gene
approach, direct approach, functional cloning, indirect approach, linkage analysis,
positional cloning, random
genome-wide association studies; Functional
genomics glossary gene family: [From an account of a Gene Nomenclature workshop held in
conjunction with the annual American Society of Human Genetics meeting in
Philadelphia PA, US Oct. 2 2000] The Mouse
Nomenclature Committee is also producing gene family webpages to facilitate
retrieval of information for groups of related genes. Judith Blake [Jackson
Laboratory] then outlined the importance of proteins and molecular functions in
understanding biological process, moving towards the idea of annotation of
gene
products formed from genomic features, with the gene concept being more of an
abstraction. She went on to identify the different types of objects that are
found in MGD and proposed moving away from the now loosely defined term
"gene family" to a representation of "gene groupings".
Groupings can include genes related by sequence identity, by shared function, or
by evolutionary relationship such as paralogy or orthology. All of these
relationships could be represented within the databases. Finally, this could
then lead to the identification of further relationships between genes, alleles,
transcripts, polypeptides and other gene products [HM Wain et. al
"Report of ASHG- NW Gene Nomenclature Workshop", HUGO, Jan. 2001]
http://www.gene.ucl.ac.uk/nomenclature/ashgnw_report.html The concept of the gene family is not clearly defined,
and the term has been used to signify groups of genes related by function, by
sequence, or by phenotype caused. Gene symbol series representing all these
possible definitions already exist and widespread changes, to reflect only
families related by sequence, are not recommended. However, the term
"superfamily" is considered to be more specifically defined and should
be used only to refer to groups of genes related by evolutionarily defined
sequence similarities. [Julia A. White et. al "Report of the
Second International Nomenclature Workshop, Cambridge UK,
May 1-2, 1999] http://www.mendel.ac.uk/inw2.htm A set of genes which appear to be related, based on similarity of sequence
or function. [GDB query form] http://gdbwww.gdb.org/gdb-bin/genera/generaSF/hgd/GeneFamily?!action=queryform HGNC Gene Family Nomenclature, HUGO,
Human Genome Nomenclature Committee, with link to agene families
currently under review http://www.gene.ucl.ac.uk/nomenclature/genefamily.shtml gene identification: Molecular
modeling glossary gene imprinting: A phenomenon in which the phenotype of the disease depends on which parent
passed on the disease gene. For instance, Prader- Willi syndrome and Angelman
syndrome are both inherited when the same part of chromosome 15 is missing. When
the father's complement of 15 is missing, then the child has Prader-Willi, but
when the mother's complement of 15 is missing, the child has Angelman syndrome. [PhRMA]
See also under epigenetics gene map, gene mapping: Maps genomic &
genetic gene prediction: Molecular
modeling glossary gene product: A description of the protein or RNA product
(and its function, if relevant) that is coded for by the gene. [SGD
Saccharomyces Genome Database Glossary, Stanford University http://genome-www.stanford.edu/Saccharomyces/help/glossary.html#fasta There is a potential for semantic confusion between a gene product and
its molecular function, because very often these are described in exactly
the same words. For example, "alcohol dehydrogenase" can describe what
you can put in an Eppendorf tube (gene product) or it can describe the
function of this stuff. There is, however, a formal difference -- a "product"
has a (potentially) many- to- many relationship with a "molecular function." [Gene
Ontology TM Documentation] http://www.geneontology.org/GO.doc.html The biochemical material, either RNA or protein, resulting from expression
of a gene. The amount of gene product is used to measure how active a gene
is; abnormal amounts can be correlated with disease causing alleles. [DOE] gene recognition: Molecular
modeling glossary - gene regulation: The DNA and protein interactions in a gene that
determine the temporal and spatial modes of expression as well as the
amplitude of expression. [Boyce Thompson Institute for Plant Research,
Cornell University, US Glossary of Terms] http://bti.cornell.edu/bti2/bti2_page.taf?page=glossary
- Related terms Omes & omics glossary regulome,
regulomics; Proteins glossary
protein regulation
gene silencing: Interruption or suppression of the expression of a gene at transcriptional or translational levels.
[MeSH] Narrower term Functional
genomics glossary Post-Transcriptional Gene Silencing PTGS Related term epigenetics gene structural components: Includes exons, introns, regulatory sequences, splice sites, other? gene validation: Genetic validation of putative drug targets.
Related term: Drug discovery &
development: target validation genetic code: The sequence
of nucleotides, coded in triplets (codons) along the
mRNA,
that determines the sequence of amino acids in protein synthesis. The DNA
sequence of a gene can be used to predict the mRNA sequence, and the genetic
code can in turn be used to predict the amino acid sequence. [DOE] Who wrote the book of life: A history of the genetic code. Lily
E. Kay, Stanford University Press, 2000. The notion of a “code” as the key to information transfer was not articulated publicly until late 1954, when
[George] Gamow, Martynas Ycas, and Alexander Rich published an article that defined the code idiom for the first time since Watson and Crick casually mentioned it in a 1953 article. Yet the concept of coding applied to genetic specificity was somewhat misleading, as translation between the 4 nucleic acid bases and the 20 amino acids would obey the rules of a cipher instead of a code. As Crick acknowledged years later, in linguistic analysis, ciphers generally operate on units of regular length (as in the triplet DNA scheme), whereas codes operate on units of variable length (e.g., words, phrases). But the code metaphor worked well, even though it was literally inaccurate, and in Crick’s words, “‘Genetic code’ sounds a lot more intriguing than ‘genetic cipher’.” Codes and the information transfer metaphor were extraordinarily powerful, and heredity was often described as a biological form of electronic communication.
[Richard A. Pizzi "Genetic ciphering" Molecular Drug Development 4(3):
65-66 Mar. 2001] http://pubs.acs.org/subscribe/journals/mdd/v04/i03/html/03timeline.html genetic linkage maps: Maps
genomic & genetic glossary genome control maps: Expression glossary genomic DNA: Genomics glossary genotype, genotyping: Sequencing,
DNA & beyond glossary haplotype, haplotyping: Sequencing,
DNA & beyond glossary hypothetical genes: In-depth definitions introns: Sequences,
DNA & beyond glossary jumping genes: See transposons localization: Cell localization prediction by the PSORT program, which
predicts signal sequences, cleavage sites, transmembrane segments and topology,
and protein localization sites in cells. The procedure is based upon sequence
features such as a N-terminal positively charged regions and regions of high
hydrophobicity which are combined into a subcellular localization prediction.
(Horton, P and Nakai, K, Intelligent Systems for Molecular Biology 5 147-152
(1997)) http://www.hri.co.jp/HUNT/manual.html Related terms: protein localization, subcellular localization localize: Determination of the original position (locus)
of a gene or other marker on a chromosome. [DOE] locus (plural loci): Position on a chromosome of a gene or other
chromosome marker; also the DNA at that position. The use of locus is sometimes
restricted to mean regions of DNA that are expressed. [DOE] Related term gene
expression. Expression glossary Any genomic site, whether functional or not, that can be mapped through
formal genetic analysis. [NHLBI] Mendelian genetics: Genomics glossary molecular function: Compare gene product. muton: See under cistron ORF open reading frame: Sequences,
DNA & beyond May, but don't necessarily represent genes. Broader
term reading frames Sequences, DNA
& beyond ORFans: Hypothetical genes, so called because they show no sequence
similarity to those in any other organisms. [Steven M. Brenner, Univ. of
California- Berkeley
homepage] http://www-bioeng.berkeley.edu/faculty/brenner.html
Related term In-depth hypothetical genes; Omes & omics
glossary ORFeome. open reading frame: See ORF Sequences,
DNA & beyond operon: A functional unit consisting of a promoter, an operator
and a number of structural genes, found mainly in prokaryotes. The structural
genes commonly code for several functionally related enzymes, and although
they are transcribed as one (polycistronic) mRNA each is independently
translated. In the typical operon, the operator region acts as a controlling
element in switching on or off the synthesis of mRNA. (operator gene) [IUPAC
Biotech] The genetic unit consisting of a feedback system under the control of
an operator gene, in which a structural gene transcribes its message
in the form of mRNA upon blockade of a repressor produced by a regulator
gene. Included here is the attenuator site of bacterial operons where transcription
termination is regulated. [MeSH] paramutation: See under epigenetics. phenotype, phenotyping: Genomics glossary pleiotropic gene: A gene
affecting more than one (apparently unrelated) characteristic of the phenotype.
IUPAC Biotech 1992, IUPAC Compendium] pleiotropism or pleiotropy: Single
genes produce multiple, seemingly unrelated phenotypic effects. polycistronic: See cistron. polygene: Genetics. A gene which acts together
with other genes to influence quantitative traits (such as size or weight).
[OED] Seems to have begun as a concept which referred to a hypothetical
single "gene" which acted with other genes in a less Mendelian fashion,
and evolved into a class of "genes" which we have yet to truly begin
to understand. Related terms polygenic, post- genomic, post-
Mendelian.
Genomics
Glossary proper cDNA: See under cDNA proteins: Proteins glossary, Protein
structures glossary quantitative gene: See under polygene. recon: See under cistron regulon: In eukaryotes, a genetic unit consisting of a noncontiguous
group of genes under the control of a single regulator gene. In bacteria,
regulons are global regulatory systems involved in the interplay of pleiotropic
regulatory domains. These regulatory systems consist of several operons.
[MeSH] repressors: See under regulator genes retrotransposon: DNA
fragments copied from viral
RNA
with reverse transcriptase
that insert in the host chromosomes.
[Edward Bollenbach, Life Sciences Dictionary] Related term transposons. SNP
Single Nucleotide Polymorphism: Genetic
variations glossary superfamily: See under gene family. transcript clusters: [Bo] Yuan [Ohio State Univ.] avoids calling the
index entries genes, preferring to call them transcript clusters, a
careful term referring to how cDNAs and ESTs from different databases are
grouped together based on homology. "They should be genes, but we don't
have the evidence yet," he says. "We still have to confirm that all
those transcripts and ESTs that align with the genome are functional." ...
Confirming that predictions are real genes, known as validation, is a major
reason the gene count will remain open for a while. "A prediction is just a
prediction," says [Michael] Cooke [Genomics Institute, Novartis Research
Foundation]. "You have to validate the prediction experimentally before you
can call it a gene." [Tom Hollon "Human Genes: How Many?"
Scientist 15 (20): 1, Oct. 15, 2001] http://www.the-scientist.com/yr2001/oct/hollon_p1_011015.html transit peptide coding sequence: Coding sequence for an
N-terminal domain of a nuclear- encoded organellar protein; this domain
is involved in post- translational import of the protein into the organelle.
[DDBJ/ EMBL/ GenBank Feature Table] http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html transposons: A mobile genetic element that can replicate itself
and insert itself into the genome, including interrupting genes and disrupting
their function, an insertional mutagen. [CHI Functional Genomics] One of a class of genes that are capable of moving spontaneously from
one chromosome to another, or from one position to another in the same
chromosome; also known as jumping genes or transposable elements.
[Glick] DNA elements carrying genes for transposition and other genetic functions.
In many cases the latter genes enable bacteria to live in extreme environments.
Transposons are much longer than IS (Insertion) elements. Abbreviated Tn.
[Schlindwein] First recognized in the 1940’s by Dr. Barbara McClintock in studies
of peculiar inheritance patterns found in the colors of Indian corn. Also
known as “jumping DNA”, referring to the fact that some stretches
of DNA are unstable and “transposable” i.e. they can move around – on and
between chromosomes. This theory was confirmed in the 1980’s when
scientists observed jumping DNA in other genomes. [HGMIS Oak Ridge National Lab,
US] http://www.ornl.gov/hgmis/faq/faqs1.html Related term retrotransposon trans-splicing: Sequences,
DNA & beyond glossary variants: Genetic variations
glossary Bibliography DDBJ/ EMBL/ GenBank
Feature Table, 2001 http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html Lewin, Benjamin Genes VII, Oxford University Press, 1999. To order: http://www.oup.co.uk/best.textbooks/biochemistry/genesvii/
Online (full- text) and updated http://www.ergito.com
Has extensive glossary Alpha
glossary index IUPAC definitions are reprinted with the permission of the International
Union of Pure and Applied Chemistry Categories of genes antibody genes: In 1989, we set out to map and sequence all the human antibody genes. Over a 10 year period,
16 scientists at the MRC Laboratory of Molecular Biology and the MRC Centre for Protein
Engineering were involved in the work which resulted in the first publication of the complete
maps of the human heavy chain and lambda light chain genes and the compilation of the V
BASE database. Our results demonstrated that there are far fewer functional genes than
originally anticipated ... by analysing the use of individual antibody genes and the extent of somatic mutation we went on to show
that the pattern and extent of diversity introduced by somatic hypermutation is complementary
to that created by combinatorial rearrangement. Finally, many of the germline genes cloned
during the mapping and sequencing work were used as building blocks in the creation of a
number of semi- synthetic phage- antibody libraries and, more recently, we have used the
analysis of main- chain conformations and side- chain diversity in the human immune system to
generate a range of new 'superantibody' libraries. [Laboratory of Molecular
Biology, Medical Research Council, UK "Human Antibody Genes" c2000] http://www2.mrc-lmb.cam.ac.uk/groups/arrays/genes.html candidate genes Focus on particular SNPs thought to
have a functional effect. Within family studies SNPs spanning several
generations can show relationships between a disease and a candidate gene
or a chromosomal region. Focuses on particular coding regions of
the genome. Involves less sequencing overall and is more likely to
uncover SNPs that, in addition to serving as markers, have functional implications
and thus may shed light on biochemical mechanisms. [CHI SNPs] A gene that has been implicated in causing or contributing to the development
of a particular disease. [NHLBI] caretaker genes; See under tumor suppressor genes DME Drug Metabolizing Enzyme genes Pharmacogenomics
glossary DNA repair genes: The disregulation of repair genes can be expected to
be associated with significant, detrimental health effects, which can include an
increased prevalence of birth defects, an enhancement of cancer risk, and an
accelerated rate of aging. Although original insights into DNA repair and the
genes responsible were largely derived from studies in bacteria and yeast, well
over 125 genes directly involved in DNA repair have now been identified in
humans, and their cDNA sequence established. These genes function in a diverse
set of pathways that involve the recognition and removal of DNA lesions,
tolerance to DNA damage, and protection from errors of incorporation made during
DNA replication or DNA repair. Additional genes indirectly affect DNA repair, by
regulating the cell cycle, ostensibly to provide an opportunity for repair or to
direct the cell to apoptosis. For about 70 of the DNA repair genes listed in
Table I, both the genomic DNA sequence and the cDNA sequence and chromosomal
location have been elucidated. In 45 cases single-nucleotide polymorphisms have
been identified and, in some cases, genetic variants have been associated with
specific disorders. With the accelerating rate of gene discovery, the number of
identified DNA repair genes and sequence variants is quickly rising. This report
tabulates the current status of what is known about these genes. The report is
limited to genes whose function is directly related to DNA repair. A. Ronen, BW
Glickman "Human DNA repair genes" Environ Mol Mutagen 37 (3) : 241- 283,
2001 http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=11317342&dopt=Abstract developmental genes: Genes connected with developmental processes. differentiated genes: Genes which exhibit differential patterns of
gene expression, particularly in connection with disease and "normal"
states. gatekeeper genes; See under tumor suppressor genes housekeeping genes: In theory, expressed in all genes.
Contrast with luxury genes. Genes that encode housekeeping
proteins. Specific housekeeping genes can be used to normalize gene
expression data. hypothetical genes: Cannot be related to any previously characterized
genes. Related term ORFans. immunoglobulin genes Ig: Genes encoding the light and heavy chain
segments of immunoglobulins. Light chain gene segments are symbolized
L-V (variable), J (joining) and C (constant); Ig heavy chain segments have,
in addition, a diversity (D) gene. Each segment codes for certain amino
acids, and each has a different nucleotide sequence; the genes are assembled
by a remarkable shuffling of the segments during B lymphocyte maturation.
[MeSH] immediate-early genes: Genes that show rapid and transient expression
in the absence of de novo protein synthesis. The term was originally
used exclusively for viral genes where immediate- early referred to transcription
immediately following virus integration into the host cell. It is also
used to describe cellular genes which are expressed immediately after resting
cells are stimulated by extracellular signals such as growth factors and
neurotransmitters. [MeSH] lethal genes: Genes which result in the premature death of the
organism; dominant lethal genes kill heterozygotes, whereas recessive lethal
genes kill only homozygotes. [MeSH] Related term: Functional
genomics embryonic lethal luxury genes: Not necessary to increase growth but function to
improve growth. [Bas Teusink et al “Luxury genes, mere slaves, one
of a dozen, or sophisticated regulators? What is behind silent phenotypes?
EUROFAN 1998] http://www.mips.biochem.mpg.de/proj/eurofan/eurofan_2/n99/ef98_abstracts/Westerhoff_180.html Specialized genes with specific functions. Compare with housekeeping
genes. mitochondrial genes: Human mitochondrial DNA (one of the smallest known) is only 16,569 base
pairs in length … We now know the complete DNA sequence … It codes for
ribosomal RNAs and transfer RNAs used in the mitochondrion, and contains
only 13 recognizable genes that code for polypeptides … the mitochondrial
genome, unlike the much larger nuclear genome, is directly transmitted
through the maternal line, making it an ideal piece of DNA with which to
trace family lineages. [Kenneth Miller, Brown Univ. “The fire within: the
unfolding story of human mitochondrial DNA, 1997] http://biocrs.biomed.brown.edu/Books/Essays/MitochondrialDNA.html Useful for evolutionary research, population genetics, phylogenetics and
conservation biology. Related term mtDNA Sequences,
DNA & beyond mitochondria Cell
biology glossary non-nuclear genes: Mitochondrial or chloroplast genes. Any others? nuclear genes: Are these the vast majority of genes, all those
except the ones found outside the nucleus? oncogene: A normal cellular gene which, when inappropriately
expressed or mutated, can transform eukaryotic cells into tumour cells.
[IUPAC Medicinal Chemistry] A gene, one or more forms of which is associated with cancer. Many oncogenes
are involved, directly or indirectly, in controlling the rate of cell growth.
[DOE] operator genes: Control the structural genes. Nobel Prize in Physiology or
Medicine awarded in 1965 to Francois Jacob, Andre Lwoff and Jacques Monod for
discoveries related to these. organelle genes: The majority of genetic information is located
in the nucleus and inherited according to Mendel's laws. However, there
are small but essential genomes located in the cellular organelles - the
plastids and mitochondria Organelle genomes differ from nuclear genomes
in a number of important ways. They are small relative to the nuclear genome.
There are usually multiple copies of organelles per cell and each organelle
contains from 20 to 20,000 organelle genomes depending upon the cell type.
Organelle genomes are organized into structures called nucleoids. [Christine
Chase, PCR 6528 Institute of Food and Agricultural Sciences, Univ. of Florida,
2000] http://www.ifas.ufl.edu/~ctdc/pmb1.htm Related
term organelles Cell biology glossary orphan genes: Putative ORFs without any resemblance to
previously determined protein- coding sequences…While theoretical evolutionary
arguments support the reality of genes when homologues are found in a variety
of distant species, this is not the case for orphan genes … Our results
suggest that a vast majority of E. coli ORFs presently annotated
as “hypothetical” correspond to bona fide genes. [J Alimi et al “RT-PCR
validation of 25 “orphan” genes” Genome Research 2000 Jul; 10 (7): 959-66]
http://igs-server.cnrs-mrs.fr/igs/abstract/an2000/abstract05.html Predicted genes which have no significant similarities to genes of known
function, or genes with unknown function. [European Commission press release “Towards
the first complete analysis of a plant genome”, 29 Jan.1998] http://europa.eu.int/comm/research/press/1998/pr2901en.html pleiotropic gene: A gene affecting more than one (apparently
unrelated) characteristic of the phenotype. [IUPAC Biotech] predicted genes: See under orphan genes [above] and in Molecular
Modeling glossary "gene prediction" processed genes: See under pseudogenes. promoter genes: See promoter, promoter region Sequencing
DNA & beyond glossary pseudogenes: Genes bearing close resemblance to known genes at
different loci, but rendered non-functional by additions or deletions in
structure that prevent normal transcription or translation. When lacking
introns and containing a poly-A segment near the downstream end (as a result
of reverse copying from processed nuclear RNA into double- stranded DNA),
they are called processed genes. [MeSH] putative genes: Conjectural genes, predicted by gene or exon
identification software. Is there a difference between putative and predicted
genes? rRNA genes: Genes, found in both prokaryotes and eukaryotes,
which are transcribed to produce the RNA which is incorporated into ribosomes.
Prokaryotic rRNA genes are usually found in operons dispersed throughout
the genome, whereas eukaryotic rRNA genes are clustered, multicistronic
transcriptional units. [MeSH] Related term tRNA. regulator genes: A gene which codes for a protein (an activator
or repressor) having the ability to induce or repress the transcription
of other genes. [IUPAC Biotech] Genes which regulate or circumscribe the activity of other genes; specifically,
genes which code for proteins (repressors or activators) which regulate
the genetic transcription of the structural genes and/or regulatory genes.
[MeSH] regulatory region: A region associated with a gene to which proteins
bind, regulating that gene's expression, such as TATA box, which functions
as a binding site for transcription factors. [GDB query form] http://gdbwww.gdb.org/gdb-bin/genera/generaSF/hgd/RegulatoryRegion?!action=queryform Prediction of regulatory regions remains difficult. reporter genes: Genes whose expression is easily detectable and
therefore used to study promoter activity at many positions in a target
genome. In recombinant DNA technology, these genes may be attached to a promoter region of interest. [MeSH] signal peptide coding sequence: Coding sequence for an N-terminal
domain of a secreted protein; this domain is involved in attaching nascent
polypeptide to the membrane; leader sequence. [DDBJ/ EMBL/ GenBank Feature
Table] http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html split genes: Genes with exons and introns. Sequences
DNA & beyond structural genes: Genes that code for proteins required for the
enzymatic and structural functions of cells. They include developmental and differentiated genes. [MeSH] suppressor genes: Genes that inhibit expression of a previous
mutation. They allow the wild- type phenotype to be wholly or partially
restored. [MeSH] A gene which helps to reverse the effects of damage to an individual’s
genetic material, typically effects which might lead to uncontrolled cell
growth. A suppressor gene may code for a protein which checks genes for
misspellings, and/ or which triggers a cell’s self- destruction if too many
genetic mutations have accumulated. [PhRMA] syntenic genes: See under synteny. synteny: Two genes which occur on the same chromosome are syntenic;
however, syntenic genes may or may not be linked. [NHLBI] tumor suppressor gene: A protective gene that normally limits
the growth of tumors. When a tumor suppressor is mutated, it may fail to
keep a cancer from growing. BRCA1 and p53 are well- known tumor suppressor
genes. [NHGRI] Ken Kinzler and Bert Vogelstein distinguish between "gatekeeper"
tumor suppressor genes (classical) and "caretakers" (in DNA repair and
genome integrity, whose action lies outside the pathway). [KW Kinzler, B.
Vogelstein "Cancer- susceptibility genes. Gatekeepers and caretakers"
Nature 386(6627):761, 763 Apr. 24, 1997] virulence genes: |