You are here Glossary homepage/Search > Applications > Sequencing
 
Sequencing Glossary
Evolving Terminology for Emerging Technologies
Comments? Suggestions Revisions? mchitty@healthtech.com
Last revised December 06, 2001.
 
 
The "race" to sequence the Human Genome is not a 50 yard dash, but a marathon.  Although the Human Genome Project is well ahead of schedule, and a number of genes have been identified, we have just begun to get a glimpse of what specific genes do and how we might be able to better use this knowledge for therapeutic interventions.  Teasing apart the interactions of  genes and proteins, delineating changes throughout the cell cycle, and correlating changes with health and disease will take even more time.  But with complete sequences, and the possibility of cross- species comparisons we can expect new insights and speeding up over time.

And sequencing DNA is only a first step towards finding what functions are connected with specific sequences. Sequencing proteins (and then determining the structure of proteins – and the function of proteins) comes next. Progress is being made with proteins, but sequencing of carbohydrates is even more difficult.

Related glossaries include  Applications  Functional genomics, Proteomics, Informatics Algorithms  Bioinformatics,  Molecular modeling,  Technologies Chromatography & electrophoresis, Mass spectrometry, Biology Genetic variations Proteins, Protein Structures, Sequences - DNA & beyond.  Additional definitions appear in the In-depth glossary, after the Bibliography.

alignment: The process of lining up two or more sequences to achieve maximal levels of  identity (and conservation, in the case of amino acid sequences) for the purpose of assessing the degree of similarity and the possibility of homology.  [NCBI Bioinformatics]

Narrower terms In-depth global alignment, local alignment, optimal alignment, pairwise alignment. Related terms BLAST In-depth BEAUTY, BLAST2, FASTA, gapped BLAST, Needleman - Wunsch, Smith - Waterman alignment. 

assembled: The term used to describe the process of using a computer to join up bits of sequence into a larger whole. [Peer Bork, Richard Copley "Filling in the gaps" Nature 409: 218-820, 15 Feb. 2001]  

This is different from assembly language, and the source of confusion between biologists and computer scientists.

Related term contig assembly

clone, cloning: Cell biology glossary  

contig: Group of cloned (copied) pieces of DNA representing overlapping regions of a particular chromosome. [DOE]  Narrower terms In-depth initial sequence contigs, merged sequence contigs. Related terms clone, contig assembly, scaffolds.

Published genome sequence has many gaps and interruptions. Concept of  "contig" is crucial to our understanding of current limitations. [David Galas "Making sense of the sequence" Science 291 (5507): 1257, Feb. 16, 2001]

contig assembly: One of the most difficult and critical functions in DNA sequence analysis is putting together fragments from sets of overlapping segments. Some programs do this better than others, particularly when dealing with sequences containing gaps. [Laura De Francesco "Some things considered" Scientist 12[20]:18, Oct. 12, 1999]  http://www.the-scientist.com/yr1998/oct/profile1_981012.html

National Center for Biotechnology Information, US, NCBI Contig Assembly and Annotation Process, Feb. 2001   http://www.ncbi.nlm.nih.gov/genome/guide/build.html

coverage (or depth): The average number of times that a nucleotide is represented by a high- quality base in a collection of random raw sequence. Operationally, a `high- quality base' is defined as one with an accuracy of at least 99% (corresponding to a Phred score of at least 20). [UC-Santa Cruz, US, Human Genome Project Working Draft Terminology, 2001] http://genome.ucsc.edu/goldenPath/term.html

depth: See under coverage

directed sequencing: See under shotgun sequencing

draft genome sequence [human]: The sequence produced by combining the information from individual sequenced clones (by creating merged sequenced contigs and them employing linking information to create scaffolds) and positioning the sequence along the physical map of the chromosomes. (Nickname "golden path".) [Univ. of California Santa Cruz Human Genome Project Working Draft terminology]  http://genome.ucsc.edu/goldenPath/term.html

Sequence with lower accuracy than a finished sequence; some segments are missing or in the wrong order or orientation. [History of the Human Genome Project" A Genome Glossary" Science 291: pullout chart Feb. 16, 2001] 

See also working draft, human sequence

dynamic programming methods: Assure the optimal global (Needleman and Wunsch 1970; Sankoff and Kruskal 1983) or local (Smith, et al. 1981) alignment by simply exploring all possible alignments and choosing the best. ["Pedestrian guide to analysing sequence databases" Burkhard Rost, Reinhard Schneider, 1999] http://cubic.bioc.columbia.edu/papers/1999_pedestrian/paper.html 

These methods allow the introduction of artificial gaps in aligned sequences to create an optimal alignment. Related terms alignment, In-depth gap penalties, global alignment, local alignment, Needleman-Wunsch,  sequencing algorithms

EST assembly: 

finished sequence - human: Sequence in which bases are identified to an accuracy of no more than 1 error in 10,000 and are placed in the right order and orientation along a chromosome with almost no gaps. [History of the Human Genome Project" A Genome Glossary" Science 291: pullout chart Feb. 16, 2001]

Each base pair has been sequenced 8-10 times, with the remaining gaps limited by present technology. ...  No eukaryotic genome sequenced so far has been totally sequenced - current technology isn't up to it. Highly repetitive regions (not expected to contain many protein- coding genes) can be impossible (or very difficult) to clone.  One definition of  "finished"    is that fewer than one base in 10,000 is incorrectly assigned. [Peer Bork, Richard Copley "Filling in the gaps" Nature 409: 218-820, 15 Feb. 2001]

At some level it’s a little arbitrary when you declare a sequence essentially complete." says NHGRI Director Francis Collins… The definition of finished is evolving. Our definition today is different from 10 years ago. Ten years ago we didn’t even think at the level of genomes." says Laurie Goodman, editor of Genome Research. "I think the community at large should define done. Not everyone is going to agree, but when you’re using the word you should define what it means." Francis Collins says "You’re done when you’ve exhausted the standard methods for closing the gaps. There should be some biological reason why those last bits of sequence eluded you – not because you just didn’t bother." ["Are we there yet?" The Scientist :12 July 19, 1999] http://www.the-scientist.com/yr1999/july/hopkin_p12_990719.html

Related terms finished clone, Human Genome Project, post-genomic. Genomics glossary

finishing standards - Human Genome Project: The International Human Genome Consortium recognizes the need to maximize the likelihood that the finished human genome sequence meets consistent standards of quality across all participating genome centers, and to adopt uniform practices and annotation for regions that present problems for current sequencing technology. At the Seventh International Meeting, the Consortium approved a detailed set of consensus standards for what should be considered as finished sequence, a set of rules for dealing with regions that are difficult to resolve, and a set of finishing annotation tags to be submitted with accessions. [Washington Univ. School of Medicine "Finishing Standards" Dec. 12, 2000] http://genome.wustl.edu/gsc/Overview/finrules/hgfinrules.html

full shotgun coverage: The coverage in random raw sequence needed from a large-insert clone to ensure that it is ready for finishing; this varies among centers but is typically 8-10 fold. Clones with full shotgun coverage can usually be assembled with only a handful of gaps per 100 kb.  [Univ. of California Santa Cruz Human Genome Project Working Draft terminology]  http://genome.ucsc.edu/goldenPath/term.html

gap: A space introduced into an alignment to compensate for insertions and deletions in one sequence relative to another. To prevent the accumulation of too many gaps in an alignment, introduction of a gap causes the deduction of a fixed amount (the gap score) from the alignment score. Extension of the gap to encompass additional nucleotides or amino acid is also penalized in the scoring of an alignment. [NCBI Bioinformatics] Narrower term In-depth gap penalties

genotyping: The determination of relevant nucleotide-base sequences in each of the two parental chromosomes. [CHI SNPs Update] Broader term sequencing; Narrower term haplotyping

genotyping technologies: Genotyping technologies have proliferated rapidly in recent years, and at least one hundred methods are currently available for detecting the genotypes of individual SNPs. In diploid organisms, such as humans, the linkage of particular SNP genotypes on each chromosome in a homologous pair (the haplotype) may provide additional information not available from SNP genotyping alone. [Lawrence Berkeley Lab "High Throughput Haplotying of Diploid Organisms, 2001]  http://www.lbl.gov/Tech-Transfer/collaboration/techs/lbnl1748.html  Related terms Genetic variations glossary  haplotyping, Sequencing glossary

global alignment: The alignment of two nucleic acid or protein sequences over their entire length. [NCBI Bioinformatics] Related term dynamic programming methods, Broader term alignment

haplotyping: Broader terms genotyping, sequencing

haplotyping technologies: 

Hidden Markov Models HMM: Algorithms & data management

homology:  Narrower terms sequence homology, sequence homology- nucleic acid; Functional genomics glossary homology Related terms homolog (homologue), similarity  Molecular modeling homology modeling 

human sequence: See draft sequence, finished sequence, published sequence, working draft

library; library, genomic: Cell biology glossary

protein sequence: Proteins glossary

published working drafts - human genome: International Human Genome Sequencing Consortium special issue: Nature 409 (6822) 15 Feb 2001  http://www.nature.com/nature/journal/v409/n6822/  http://www.nature.com/genomics/human/papers/analysis.html

Human Genome [Celera Genomics sequence] special issue: Science 291 (5507) Feb. 16, 2001 http://www.sciencemag.org/content/vol291/issue5507/index.shtml

random sequencing: See under shotgun sequencing

resequencing: Previously sequenced site is resequenced for SNP discovery or other purposes. [CHI SNPs]

Eric Lander, director of the Whitehead Institute's Center for Genome Research, and professor of biology at MIT notes " The human genome will need to be sequenced only once, but it will be resequenced thousands of times, in order, for example to unravel the polygenic factors underlying human susceptibilities and predispositions … Re-sequencing will also provide the ultimate tool for genotyping studies" [E. Lander "The New Genomics" Science 274: 536, 25 Oct. 1996]

rough drafts - human genome: Related terms finished sequence, finishing standards, published working drafts, working drafts

scaffolds: Ordered set of contigs placed on the chromosome. [NCBI, Human Genome Home "Contig Assembly Process" Glossary, Feb. 2001] http://www.ncbi.nlm.nih.gov/genome/guide/build.html#glossary.

A series of contigs that are in the right order but are not necessarily connected in one continuous stretch of sequence. [History of the Human Genome Project" A Genome Glossary" Science 291: pullout chart Feb. 16, 2001]

The definition of a scaffold appears to be quite different in the Science and Nature draft published sequences. [David Galas "Making sense of sequence" Science 291: 1257-  Feb. 16, 2001]  This is also different from the scaffold defined in Drug discovery and development glossary.  

The result of connecting contigs by linking information, such as paired-end reads from plasmids, paired-end reads from BACs, known mRNAs, or other sources. The contigs in a scaffold are ordered and oriented with respect to one another. [Univ. of California Santa Cruz Human Genome Project Working Draft terminology]  http://genome.ucsc.edu/goldenPath/term.html

Narrower terms In-depth sequence- contig scaffold, sequenced- clone- contig scaffold Related term In-depth contig assembly.

scoring methods: Many choices, best choice often problem dependent.. Nice review "Sequence Analysis: Which scoring method should I use? Pittsburgh Supercomputing Center, Carnegie Mellon Univ. 1999]  http://www.psc.edu/research/biomed/homologous/scoring_primer.html

Related terms  In-depth filtering, masking, SNP scoring.  Molecular modeling glossary  homology modeling

sequence alignments: See alignments.

sequence analysis: Sequence analysis is a robust field, and mining sequence data using bioinformatics is one of the main activities of genomics- based drug discovery. Using sequence analysis to understand whole genomes may provide an important advantage for groups looking for new drug targets among genes, or trying to pick the best among targets they already have.

Sequence analysis is one of the most widely used techniques in genomics. A great deal of sequence work will continue to be done, as researchers fill in the gaps left in the genome maps of humans and other important organisms. Studies to confirm sequence, and to identify SNPs, will also need to continue. [CHI Bioinformatics].

sequence homology: <molecular biology> Strictly, refers to the situation where nucleic acid or protein sequences are similar because they have a common evolutionary origin. Often used loosely to indicate that sequences are very similar. Sequence similarity is observable, homology is an hypothesis based on observation. (18 Nov. 1997)  [OMD]  Broader term Functional genomics glossary homology; Narrower term sequence homology- nucleic acid; Related terms Functional genomics glossary evolutionary homology; Proteomics glossary regulatory homology;  Molecular modeling glossary homology modeling; Structural genomics glossary  structural homology

sequence homology - nucleic acid: The sequential correspondence of nucleotide triplets in a nucleic acid molecule which permits nucleic acid hybridization. Sequence homology is important in the study of mechanisms of oncogenesis and also as an indication of the evolutionary relatedness of different organisms. The concept includes viral homology. [MeSH] Broader term sequence homology

sequencing: (proteins, nucleic acids) Analytical procedures for the determination of the order of amino acids in a polypeptide chain or of nucleotides in a DNA or RNA molecule. [IUPAC Compendium] Largely automated.  

Sequencing of biomolecules began with the insulin B-chain - a thirty residue peptide - which Saenger and Tuppy deduced through a combination of limited proteolysis and chemical analysis in 1951. It was a full 14 years later, until Holley et al. determined the sequence of alanine tRNA from yeast. And it took another 12 years, until "real" DNA sequencing was developed by Maxam & Gilbert and Saenger et al in 1977.  [Introduction to bioinformatics, Univ. of Munich Gene Center, Germany, Summer 2000] http://www.lmb.uni-muenchen.de/groups/bioinformatics/01/ch_01_1.html

Narrower terms resequencing, sequencing - algorithms, sequencing - cost of, sequencing - high- throughput, sequencing - throughput, shotgun sequence, single DNA molecule sequencing,  whole genome shotgun sequencing In-depth chain termination sequencing, chemical cleavage sequencing, chemical degradation sequencing, de novo sequencing, dideoxy sequencing, microsequencing, minisequencing, multiplex sequencing, Sanger sequencing

sequencing algorithms: See In-depth BLAST, FASTA, Needleman - Wunsch, Smith - Waterman

sequencing - cost of: The cost of sequencing a single DNA base [when the Human Genome Project was iniated] was about $10 then; today, sequencing costs have fallen about 100-fold to $.10 to $.20 a base and still are dropping rapidly. [Human Genome News 11 (1-2) Nov. 2000]  http://www.ornl.gov/hgmis/publicat/hgn/v11n1/01giants.html 

sequencing - high- throughput: Uses robotics, automated DNA- sequencing machines and computers.

sequencing - throughput: Production of genome sequence has skyrocketed over the past year, with more than 90 percent of the sequence having been produced in the past 15 months alone. Because of this increased capacity, the next phase is expected to move much more rapidly than previously expected. [NHGRI, "International Human Genome Sequencing Consortium Publishes Sequence and Analysis of the Human Genome" Washington, D.C., February 12, 2001] http://www.nhgri.nih.gov/NEWS/initial_sequencePR.html

shotgun sequencing  method: Sequencing method which involves randomly sequencing tiny cloned pieces of the genome, with no foreknowledge of where on a chromosome the piece originally came from. This can be contrasted with "directed" [sequencing] strategies, in which pieces of DNA from adjacent stretches of a chromosome are sequenced. Directed strategies eliminate the need for complex reassembly techniques. Because there are advantages to both strategies, researchers expect to use both random (or shotgun) and directed strategies in combination to sequence the human genome. [DOE] 

Uses dynamic programming methods. Narrower term whole genome shotgun sequencing.

similarity: Functional genomics glossary  

similarity search: BLAST, FASTA and Smith- Waterman (see In-depth) are examples of similarity search algorithms.

single DNA molecule sequencing:  The evolution of technology for single DNA molecule sequencing will ultimately permit whole genome analysis of populations of cells at high resolution and will obviate current PCR- based approaches, particularly important for sequencing diploid or polyploid cells. This is the ultimate in sensitivity, and perhaps difficulty. Further in the future, it might be possible to utilize the protein synthesis machinery of the cell as a "sequencing engine."   [National Center for Research Resources "Integrated Genomics Technologies Workshop Report" Jan 1999]  http://www.ncrr.nih.gov/newspub/genomic.pdf 

viral homology: See under sequence homology- nucleic acid

whole genome shotgun sequencing: Celera’s whole genome shotgun sequencing technique involves sequencing from both ends of the double stranded cloned DNA. Celera’s accurately paired clone end sequences are a key tool for assembling the genome much more completely than single stranded sequencing methods allow at comparable levels of sequence coverage. Celera’s paired end- sequencing strategy, as part of the whole genome shotgun sequencing technique, has now produced sequence pairs from clones that cover the human genome 11 times. The company believes that 99% of the human genome is represented in the cloned DNA. [press release "Celera Genomics completes sequencing phase of the genome from one human being" Rockville, MD,  April 6, 2000]  http://www.pecorporation.com/press/prccorp040600.html  Broader term shotgun sequencing method.

"working draft, human genome sequence": This milestone was announced at the White House (Washington DC, US) on June 26, 2000.  President Bill Clinton was joined by Francis Collins (National Human Genome Research Institute) and Craig Venter (Celera Genomics) and heads of the major US genome sequencing centers. Work continues to be done on annotating the sequence, but further celebration ensued with publication of two versions of the sequence in Feb. 2001.  Related terms draft sequence, finished sequence - human, published working drafts. Genomics glossary  Human Genome Project 

Bibliography

NCBI (US) BLAST Glossary, 2000. 40+ definitions http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/glossary2.html

Alpha glossary index

IUPAC definitions are reprinted with the permission of the International Union of Pure and Applied Chemistry.

In-depth Sequencing glossary

BEAUTY BLAST Enhanced Alignment Utility:  An enhanced version of the NCBI's BLAST database search tool. BEAUTY, when used to search three new custom sequence databases that we have developed, incorporates information on sequence family membership, the location of the conserved domains, and the locations of any annotated domains and sites directly into BLAST search results. These enhancements make it much easier to detect weak, but functionally significant, matches in BLAST database searches.
http://searchlauncher.bcm.tmc.edu:9331/seq-search/Help/beauty.html

BLAST (Basic Local Alignment Search Tool): Software program from NCBI for searching public databases for homologous sequences or proteins. Designed to explore all available sequence databases regardless of whether query is protein or DNA. http://www.ncbi.nlm.nih.gov/BLAST/

Faster but less rigorous than FASTA or Smith- Waterman In-depth

BLAST2: A newer release of BLAST that allows insertions or deletions in the aligned sequences. Gapped alignments may be more biologically significant. Synonymous with gapped BLAST  [labvelocity.com]

chain termination sequencing method: See Sanger sequencing (under Maxam- Gilbert & Sanger).

chemical cleavage sequencing: See Maxim- Gilbert sequencing.

chemical degradation sequencing: See Maxim- Gilbert sequencing.

consensus sequence: A theoretical representative nucleotide or amino acid sequence in which each nucleotide or amino acid is the one, which occurs most frequently at that site in the different forms which occur in nature. The phrase also refers to an actual sequence, which approximates the theoretical consensus. [MeSH]

A sequence of DNA, RNA, protein or carbohydrate derived from a number of similar molecules, which comprises the essential features for a particular function. [IUPAC Bioinorganic]

conserved sequence: A base sequence of a DNA molecule or protein molecule is a sequence that has remained largely unchanged throughout evolution. [DOE]

A "highly conserved sequence" is a DNA sequence that is very similar in several different kinds of organisms. Scientists regard these cross species similarities as evidence that a specific gene performs some basic function essential to many forms of life and that evolution has therefore conserved its structure by permitting few mutations to accumulate in it. [NHGRI]

de novo sequencing: Determination of sequences (of genes or amino acids) whose sequence is not yet known. Can be done with LC/MS/MS or nanoelectrospray MS/MS. [CHI Proteomics] From the Latin "de novo" from the beginning. See also Mass spectrometry glossary.

dideoxy sequencing: See Sanger sequencing under Maxam-Gilbert & Sanger.

FACS: Fluorescence activated cell sorter. Related terms flow cytometry, flow sorting.

FASTA: The first widely used algorithm for database similarity searching. The program looks for optimal local alignments by scanning the sequence for small matches called "words". Initially, the scores of segments in which there are multiple word hits are calculated ("init1"). Later the scores of several segments may be summed to generate an "initn" score. An optimized alignment that includes gaps is shown in the output as "opt". The sensitivity and speed of the search are inversely related and controlled by the "k-tup" variable which specifies the size of a "word". (Pearson and Lipman)  [NCBI Bioinformatics]  More rigorous and slower than BLAST.  http://fasta.bioch.virginia.edu/

filtering: Also known as masking. The process of hiding regions of (nucleic acid or amino acid) sequence having characteristics that frequently lead to spurious high scores. [NCBI Bioinformatics] 

flow cytometry: Technique for characterizing or separating particles such as beads or cells, usually on the basis of their relative fluorescence. [IUPAC Combinatorial Chemistry]

Technique using an instrument system for making, processing, and displaying one or more measurements on individual cells obtained from a cell suspension. Cells are usually stained with one or more fluorescent dyes specific to cell components of interest, e.g., DNA, and fluorescence of each cell is measured as it rapidly transverses the excitation beam (laser or mercury arc lamp). Fluorescence provides a quantitative measure of various biochemical and biophysical properties of the cell, as well as a basis for cell sorting. Other measurable optical parameters include light absorption and light scattering, the latter being applicable to the measurement of cell size, shape, density, granularity, and stain uptake. [MeSH]

Related terms FACS, flow sorting

flow sorting: Employs flow cytometry to separate, according to size, chromosomes isolated  from cells during cell division when they are condensed and stable. As the chromosomes flow singly past a laser beam, they are differentiated by analyzing the amount of DNA present, and individual chromosomes are directed to specific collection tubes. [Primer on Molecular Genetics, ORNL, US] http://www.ornl.gov/hgmis/publicat/primer/intro.html

GRAIL: Gene Recognition and Assembly Internet Link software http://compbio.ornl.gov/Grail-1.3/  A suite of tools designed to provide analysis and putative annotation of  DNA sequences both interactively and through the use of automated computation. [Grail overview, Oak Ridge National Lab, US] http://compbio.ornl.gov/manuals/grail1.3-genquest.9605.shtml#GrailOverview

Does this name refer in some way to Walter Gilbert's description of the Human Genome Project as the "Holy Grail" of molecular biology?

gap penalties: An important problem is the treatment of gaps, i.e., residue inserted (or deleted) to optimise the objective function. Usually, gap penalties (cost of inserting and extending gaps) are chosen to be length dependent. Typically, the cost of extending a gap (gap elongation) is 5-10 times lower than is the cost for introducing a gap (gap open). The optimal choice of gap penalties depends on the particular method and, in detail, on the particular sequence family ["Pedestrian guide to analysing sequence databases" Burkhard Rost, Reinhard Schneider, 1999]   http://cubic.bioc.columbia.edu/papers/1999_pedestrian/paper.html  Related terms alignment, dynamic programming methods. Broader term gaps

gapped BLAST:  A version of the BLAST algorithm that allows gaps (deletions and insertions) to be introduced into aligned sequences. The scoring of these gapped alignments tends to reflect biological relationships more closely. Synonymous with BLAST2.  [labvelocity.com]

initial sequence contigs: Derived from sequenced clones [David Galas "Making sense of the sequence" Science 291: 1257-1260, 16 Feb. 2001]

local alignment: The alignment of some portion of two nucleic acid or protein sequences. [NCBI Bioinformatics]

Best alignment method for sequences for whom no evolutionary relatedness is known. See Smith- Waterman alignment.  Compare global alignment.

masking: Also known as filtering. The removal of repeated or low complexity regions from a sequence in order to improve the sensitivity of sequence similarity searches performed with that sequence. [NCBI Bioinformatics] 

Maxam-Gilbert sequencing & Sanger sequencing: The two basic sequencing approaches, Maxam- Gilbert and Sanger, differ primarily in the way the nested DNA fragments are produced. Both methods work because gel electrophoresis produces very high resolution separations of DNA molecules; even fragments that differ in size by only a single nucleotide can be resolved. Almost all steps in these sequencing methods are now automated. Maxam- Gilbert sequencing (also called the chemical degradation method) uses chemicals to cleave DNA at specific bases, resulting in fragments of different lengths. A refinement to the Maxam- Gilbert method known as multiplex sequencing enables investigators to analyze about 40 clones on a single DNA sequencing gel.  Sanger sequencing (also called the chain termination or dideoxy method) involves using an enzymatic procedure to synthesize DNA chains of varying length in four different reactions, stopping the DNA replication at positions occupied by one of the four bases, and then determining the resulting fragment lengths. [Primer on Molecular Genetics,  Oak Ridge National Lab, US]  http://www.ornl.gov/hgmis/publicat/primer/intro.html

merged sequence contigs: Derived by merging sequence contigs from overlapping sequenced clones. [David Galas "Making sense of the sequence" Science 291: 1257-1260, 16 Feb. 2001]

microsequencing: Sequencing of proteins or peptides in very small amounts (sub microgram), sometimes for use as probes.

minisequencing: A solid- phase method for the detection of any known point mutation or allelic variation of DNA. In the method amplified, biotinylated DNA sequences containing the mutation site are immobilized onto streptavidin coated microplate and primer extension reactions are carried out using labeled nucleotides. Incorporation of the labeled nucleotide is dependent on the genotype and is analyzed using ELISA technique. Assay method allows automation. [Labsystems Oy, Finland] http://www.labsystems.fi/applications/photometry/an104.htm

Single base sequencing. Related terms Genetic variations glossary  

multiple sequence alignment: An alignment of three or more sequences with gaps inserted in the sequences such that residues with common structural positions and/or ancestral residues are aligned in the same column. ClustalW is one of the most widely used multiple sequence alignment programs. [NCBI Bioinformatics] 

The concept of dynamic programming cannot be extended to align more than three sequences optimally (Murata 1990). A way around this problem is to first find optimal pairwise alignments and to then merge the pairs  .["Pedestrian guide to analysing sequence databases" Burkhard Rost, Reinhard Schneider, 1999] http://cubic.bioc.columbia.edu/papers/1999_pedestrian/paper.html

Related term Hidden Markov Models HMM

multiplex sequencing: See under Maxam- Gilbert sequencing.

Needleman-Wunsch: Global sequence alignment algorithm. [Needleman, S. B., Wunsch, C. D., "A general method applicable to the search for similarities in the amino acid sequence of two proteins" J. Mol. Biol.( 48): 443-453 Mar. 1970] Related terms dynamic programming; Algorithms & data management glossary, Molecular modeling glossary

optimal alignment: An alignment of two sequences with the highest possible score. [NCBI Bioinformatics] 

Alignments are intended to unravel evolutionary pathways and/ or structural homology between two proteins. These two objectives (functional/ structural) may be mutually contradictory, i.e., the 'optimal' alignment' may differ according to the objective. Yet another perspective is the 'mathematical' optimal alignment. This is the alignment that optimises a given objective function, e.g., to find the alignment with the highest number of pairwise identical residues. FASTA and BLAST are not guaranteed to find such a mathematically optimal alignment. ["Pedestrian guide to analysing  sequence databases" Burkhard Rost, Reinhard Schneider, 1999]  http://cubic.bioc.columbia.edu/papers/1999_pedestrian/paper.html 

pairwise alignment: 

protein datasets: Available from Ensembl and NCBI.

Involves finding new SNPs. ... tools are just beginning to emerge and many more robust technologies are needed. [NIH, Methods for Discoverying and Scoring Single Nucleotide Polymorphisms, Request for Applications Jan. 9, 1998] http://grants.nih.gov/grants/guide/rfa-files/RFA-HG-98-001.html

SNP scoring: Involves methods to determine the genotypes of many individuals for particular SNPs that haave already been discovered. ... tools are just beginning to emerge and many more robust technologies are needed. http://grants.nih.gov/grants/guide/rfa-files/RFA-HG-98-001.html

 

Sanger sequencing: See under Maxam-Gilbert sequencing.

sequence tags: Sequence bits 2-4 contig residues in length.  Used to determine the mass of a particular sequence. [CHI Proteomics] Can be used to search protein and EST databases with high specificity. [Blackstock  & Weir “Proteomics” Trends in Biotechnology  17:121 Mar 1999]'

sequence-contig scaffold: Scaffold produced by connecting a maximal set of sequence contigs joined by bridged gaps. [Univ. of California Santa Cruz Human Genome Project Working Draft terminology]  http://genome.ucsc.edu/goldenPath/term.html

sequenced-clone-contig scaffold: Scaffold produced by joining sequenced clone contigs by bridged SCC gaps. [Univ. of California Santa Cruz Human Genome Project Working Draft terminology]  http://genome.ucsc.edu/goldenPath/term.html

Smith-Waterman alignment: An amino acid sequence alignment that illustrates sequence similarity. The alignment is generated using the Smith- Waterman algorithm (Temple Smith and MS Waterman, J Mol Biol. 147: 195-197, 1981; WR Pearson  Genomics 11:635-650, 1991) [SGD Saccharomyces Genome Database glosssary, Stanford Univ.] http://genome-www.stanford.edu/Saccharomyces/help/glossary.htm Related terms dynamic programming; Algorithms & data management glossaryMolecular modeling glossary


Cambridge
Healthtech Institute
1037 Chestnut Street
Newton Upper Falls, Ma 02464
Phone:
617-630-1300
Fax:  617-630-1325
Email: chi@healthtech.com