You are here Glossary homepage/Search > Informatics > Molecular modeling
 
Molecular modeling glossary
Evolving terminology for emerging technologies
Suggestions? Comments? Questions? mchitty@healthtech.com
Last revised December 26, 2001 
Biomedical modeling is still an art form that cannot be applied well outside of specialized research groups. Yet, the successes, in particular in the case of molecular simulation and structure prediction, have dramatically increased, as demonstrated by the routine use of modeling programs for the interpretation of many types of experiments in crystallography, and by the advance in the accuracy of predicted structures. In fact, the technology developed by the computational biology community is being used by experimental biomedical researchers. [Opportunities in Molecular Biomedicine in the Era of  Teraflop Computing, March 3 & 4, 1999, Rockville, MD] http://www.ks.uiuc.edu/Publications/Reports/teraflop/node4.html

Related glossaries include Applications Drug discovery & development Sequencing, Structural genomics. Informatics Algorithms & data managementBioinformatics,  Chemoinformatics, Computers & computing Databases & software directory Biology Protein Structure.  Additional definitions appear in the In-depth glossary, after the Bibliography.

ab initio: From the Latin: from the beginning. In modeling refers to models devised without experimental data?

ab initio calculations: Quantum chemical calculations using exact equations with no  approximations which involve the whole electronic population of the molecule. [IUPAC Computational]

ab initio gene prediction: Traditionally, gene prediction programs that rely only on the statistical qualities of exons have been referred to as performing ab initio predictions. Ab initio prediction of coding sequences is an undeniable success by the standards of the machine- learning algorithm field, and most of the widely used gene prediction programs belong to this class of  algorithms. It is impressive that the statistical analysis of raw genomic sequence can detect around 77- 98% of the genes present ...  This is, however, little consolation to the bench biologist, who wants the complete sequences of all genes present, with some certainty about the accuracy of the predictions involved. As Ewan Birney (European Bioinformatics Institute, UK) put it, what looks impressive to the computer scientist is often simply wrong to the biologist. [Meeting report "Gene prediction: the end of the beginning" Colin Semple, Genome Biology 2000 1(2): reports 4012.1-4012.3]  Broader term gene prediction.  http://www.genomebiology.com/2000/1/2/reports/4012/

All ab initio gene prediction programs have to balance sensitivity against accuracy.

ab initio protein structure prediction: See Structural genomics glossary

ab initio quantum chemistry: Involves the calculation of chemical properties directly from the molecular Schrodinger equation. The only empirical data used is the mass and charge of the nuclear particles. In some sense ab initio quantum chemistry can be viewed as a form of alchemy, in which computer cycles are transformed into chemical properties. [Michael Colvin "What is ab initio quantum chemistry?" Lawrence Livermore National Lab, US] http://gutenberg.llnl.gov/~colvin/

alignment: Sequencing glossary

binding site: Drug discovery & development glossary

CADD: See Computer Assisted Drug Design

CAMD: See Computer Aided Molecular Design, Computer Assisted Molecular Design

CAMM See Computer Assisted Molecular Modeling

computational chemistry: Chemoinformatics glossary  Related terms binding site, molecular graphics, Van der Waals

computational gene recognition: Interpreting nucleotide sequences by computer, in order to provide tentative annotation on the location, structure and functional class of protein- coding genes. [JW Fickett 1996]  Related terms gene recognition, molecular recognition.

Gene recognition is much more difficult in higher eukaryotes than in prokaryotes, as coding regions (exons) are often interrupted by non- coding regions (introns) and genes are highly variable in size.  This is particularly so for human genes. As someone remarked recently people have non- coding regions occasionally interrupted by genes.

computational genomics: Computers & computing glossary

computational modeling: See ab initio modeling, homology modeling.

Computer Aided Molecular Design (CAMD): Involves all computer-assisted techniques used to discover, design and optimize compounds with desired structure and properties.  [IUPAC Combinatorial]

Also known as molecular modeling or computational chemistry, uses computers to analyze and model the physicochemical properties of a molecule. CAMD programs allow integrated molecular design to take drug discovery to a new level by using a more cross-functional team approach to drug research and development.  [Oxford Molecular]

Computer-Assisted Drug Design CADD: Involves all computer- assisted techniques used to discover, design and optimize biologically active compounds with a putative use as drugs. Broader term drug design. Drug discovery & development glossary   [IUPAC Computational]

Computer-Assisted Molecular Design CAMD: Involves all computer-assisted techniques used to discover, design and optimize compounds with desired structure and properties.  [IUPAC Computational]

Computer-Assisted molecular modeling CAMM:  The investigation of molecular structures and properties using computational chemistry and graphical visualization techniques.  [IUPAC Computational]

docking: Three-dimensional molecular structure is one of the foundations of structure- based drug design. Often, data are available for the shape of a protein and a drug separately, but not for the two together.  The program AutoDock was originally written in FORTRAN-77 in 1990 by David S. Goodsell here in Arthur J. Olson's laboratory.  It performs automated docking of ligands (small molecules like a candidate drug) to their macromolecular targets (usually proteins, sometimes DNA) [Garrett B. Morris, “Molecular docking web”, Scripps, Dec. 2000] http://www.scripps.edu/pub/olson-web/people/gmm/index.html

docking programs: Programs for evaluating lead compounds against target proteins; these programs are “informed” by structure data. [CHI Structural genomics]

Traditional ligand- docking programs - such as DOCK, developed by Irwin Kuntz at the University of California at Berkeley; MacroModel, developed by Clark Still at Columbia University; and GOLD from MSI (now part of Pharmacopeia) - give information about potential ligands for a known protein structure.  These programs select molecules predicted to be highly complementary to the receptor structure and can screen many of these ligands against the protein.  This type of virtual screening technology  has already been incorporated into many major pharmaceutical companies’ discovery programs and offers the ability to screen many more compounds at once than the traditional laboratory- based method.  [CHI Structural genomics]

docking studies: Computational techniques for the exploration of the possible binding modes of a substrate to a given receptor, enzyme or other binding site. [IUPAC Computational] Related terms drug design, QSAR Pharmaceutical biology glossary.

drug design: See structure-based drug design Drug discovery & development glossary    Related terms 3D QSAR, QSAR Algorithms and data management glossary.

dynamic programming methods:  Sequencing glossary

exon parsing: Identifying precisely the 5' and 3' boundaries of genes (the transcription unit) in metazoan genomes, as well as the correct sequences of the resulting mRNA ("exon parsing") has been a major challenge of bioinformatics for years. Yet, the current program performances are still totally insufficient for a reliable automated annotation (Claverie 1997; Ashburner 2000). It is interesting to recapitulate quickly the research in this area to illustrate the essential limitation plaguing modern bioinformatics. Encoding a protein imposes a variety of constraints on nucleotide sequences, which do not apply to noncoding regions of the genome. These constraints induce statistical biases of various kinds, the most discriminant of which was soon recognized to be the distribution of six nucleotide-long "words" or hexamers (Claverie and Bougueleret 1986; Fickett and Tung 1992).  [JM Claverie "From Bioinformatics to Computational Biology" Genome Res 10: (9) 1277- 1279 Sept. 2000]  http://igs-server.cnrs-mrs.fr/igs/abstract/an2000/abstract13.html  

exon prediction:  Since prokaryotes don't have introns, exon prediction implies working with eukaryotes. Is exon prediction equivalent to gene prediction in prokaryotes?  Related terms ab initio gene prediction; GRAIL Sequencing glossary

gene identification: Using marker SNPs to hone in on otherwise hard to find genes. [CHI SNPs]

The effectiveness of finding genes by similarity to a given sequence segment is determined by a much simpler statistic, the total  coverage of the genome by the collective set of sequence contigs. As the overall coverage of the genome is virtually complete (> 90%), there is a strong likelihood that every gene is represented, at least in part, in the data. Thus, finding any gene by  sequence similarity searches using sufficient sequence to ensure significance is almost always possible using the data published  this week. Caution must be exercised, however, as the identification of the gene may still be ambiguous. This is because a  highly similar sequence from a receptor gene from Drosophila, for example, could be found in several different, homologous  genes, which may have similar or entirely different functions or are nonfunctioning pseudogenes. In other words, common  domains or motifs can be present in many different genes. The use of the approximate similarity search tool BLAST is probably still the best way to find similar sequences. [David Galas "Making Sense of the Sequence" Science 291: 12257-1260 Feb. 16, 2001]

Genes (and their corresponding mRNAs and proteins) are identified by aligning reference sequences (RefSeq), GenBank, mRNAs, and ESTs to the genome sequence using a program called Acembly. Acembly takes advantage of paired EST reads, measured clone lengths, and polyA tails. Transcript models are reconstructed by attempting to settle disagreements between individual sequence alignments without using an a priori model (such as codon usage, initiation, or polyA signals). In practice, there is an initial low stringency analysis followed by a clean up procedure which keeps the best hits.  ... An obvious challenge in using alignments to annotate genes is the treatment of sequence differences between the mRNA and genomic sequence. These differences could represent sequencing errors, assembly errors, naturally occurring polymorphisms, or paralogs. It is difficult to resolve these differences automatically; therefore the default treatment is to provide the mRNA and protein sequence that corresponds to the genomic sequence. The only exception is where a sequence difference changes the reading frame relative to the supporting mRNA and EST data; then the genomic sequence is frameshifted to provide the protein product that corresponds to the mRNA data. [NCBI Contig Assembly and Annotation Process, 2001]  http://www.ncbi.nlm.nih.gov/genome/guide/build.html#contig

There are two basic approaches to gene identification: by homology and ab initio approaches.

gene parsing:  Initial gene parsing methods were then simply based on word frequency computation, eventually combined with the detection of splicing consensus motifs. The next generation of software implemented the same basic principles into a simulated neural network architecture (Uberbacher and Mural 1991). Finally, the last generation of software, based on Hidden Markov Models, added an additional refinement by computing the likelihood of the predicted gene architectures (e.g., favoring human genes with an average of seven coding exons, each 150 nucleotides long) is added (Kulp et al. 1996; Burge and Karlin, 1997)). These ab initio methods are used in conjunction with a search for sequence similarity with previously characterized genes or expressed sequence tags (EST). [JM Claverie "From Bioinformatics to Computational Biology" Genome Res 10: (9) 1277- 1279.Sept. 2000]  http://igs-server.cnrs-mrs.fr/igs/abstract/an2000/abstract13.html  

gene prediction: Many methods for predicting genes are based on compositional signals that are found in the DNA sequence. These methods detect characteristics that are expected to be associated with genes, such as splice sites and coding regions, and then piece this information together to determine the complete or partial sequence of a gene. Unfortunately, these ab initio methods tend to produce false positives, leading to overestimates of gene numbers, which means that we cannot confidently use them for annotation. They also do not work well with unfinished sequence that has gaps and errors, which may give rise to frameshifts, when the reading frame of the gene is disrupted by the addition or removal of bases. ... The most effective algorithms integrate gene- prediction methods with similarity comparisons.... The most powerful tool for finding genes may be other vertebrate genomes. Comparing conserved sequence regions between two closely related organisms will enable us to find genes and other important regions in both genomes with no previous knowledge of the gene content of either.  [Ewan Birney et. al "Mining the draft human genome" Nature 409: 827-828 15 Feb. 2001]  Narrower term ab initio gene prediction.

Sadly, it is often claimed that matching back cDNA to genomic sequences is the best gene identification protocol; hence, admitting that the best way to find genes is to look them up in a previously established catalog! Thus, the two main principles behind state- of- the- art gene prediction software are (1) common statistical regularities and (2) plain sequence similarity. From an epistemological point of view, those concepts are quite primitive. [JM Claverie "From Bioinformatics to Computational Biology" Genome Res 10: (9) 1277- 1279.Sept. 2000]  http://igs-server.cnrs-mrs.fr/igs/abstract/an2000/abstract13.html  

Algorithms have been developed and are combined to recognise gene structural components.

gene recognition: Principally used for finding open reading frames, tools of this type also recognize a number of features of  genes, such as regulatory regions, splice junctions, transcription and  translation stops and starts, GC islands, and poly adenylation sites. [Laura De Francesco "Some things considered" Scientist 12[20]:18, Oct. 12, 1998] http://www.the-scientist.com/yr1998/oct/profile1_981012.html

granularity: Computers & computing glossary

Hidden Markov Models HMM: Searching a protein sequence database for homologues is a powerful tool for discovering the structure and function of a sequence. Amongst the algorithms and tools available for this task, Hidden Markov model (HMM) - based search methods improve both the sensitivity and selectivity of database searches by employing position- dependent scores to characterize and build a model for an entire family of sequences.

HMMs have been used to analyze proteins using two complementary strategies. In the first, a sequence is used to a search a collection of protein families, such as Pfam, to find which of the families it matches. In the second approach an HMM for a family is used to search a primary sequence database to identify additional members of the family. The latter approach has yielded insights into protein involved in both normal and abnormal human pathology. [Lawrence Berkeley Lab, US "Advanced Computational Structural Genomics"]  http://cbcg.lbl.gov/ssi-csb/Meso.html

homology model, homology modeling: Structural genomics glossary

in silico: Literally "in the computer".  Can be used to screen out compounds which are not druggable. Narrower terms: in silico biology, in silico modeling, in silico proteomics, in silico screening; Cell biology virtual cells in silico;  Related terms rules of five Chemoinformatics glossary

in silico biology: Advances in genomics and proteomics have greatly improved our knowledge of the components of biological systems at the molecular level. The next logical step is to try to understand how these components interact well enough to model those biological systems in silico. This conference will showcase examples and applications of computational modeling of cells, tissues, and disease. Faced with an overabundance of potential targets such models offer the promise of improved target prioritization compared with relying on empirical research alone. While such models are far from being a complete representation of a biological system, examples are already emerging where this method has aided in a greater understanding of a disease state as well as target prioritization and ultimately drug development. Anyone interested in utilizing in silico methods as a valuable tool for development of therapeutics strategies should attend this event. In Silico Biology: Modeling Systems Biology for Research and  Target Prioritization  June 2- 3, 2002  San Diego, CA

The considerable "algorithmic complexity" of biological systems requires a huge amount of detailed information for their complete description. Although far from being complete, the overwhelming quantity of small pieces of information gathered for all kind of biological systems at the molecular and cellular level requires computational tools to be adequately stored and interpreted. Interpretation of data means to abstract them as much as allowed to provide a systematic, an integrative view of biology.

Most of the presently available scientific journals focus either on accumulating more data from elaborate experimental approaches, or on presenting new algorithms for the interpretation of these data. Both approaches are meritorious. However, since both communities do not interact much with each other, neither the experimental nor the computational biologists really apply the theoretical tools to that extent which would be possible and desirable to achieve that progress of research which is already feasible. ["Aims and Scope" In Silico Biology: An international journal of computational biology] http://www.bioinfo.de/isb/aims.html

Related term: virtual cells in silico

in silico modeling: Modeling of biological pathways and other biological processes for drug discovery and development. Given the enormous increase in genetic and molecular data, such models will continue to improve and are predicted to become an essential tool for evaluating hypotheses, with only the more promising ones being subjected to empirical testing. [CHI Breaking Bottlenecks]

in silico proteomics: Prediction of protein structure and function. [Gareth W. Roberts and Jonathan Swinton "In Silico Proteomics: Playing by the rules" Current Drug Discovery 5: Aug. 1, 2001] http://www.current-drugs.com/CDD/CDD/CDDPDF/issue%205/Roberts.pdf

in silico screening: See virtual screening Chemoinformatics glossary

ligand docking: See under docking.

molecular graphics: A technique for the visualization and manipulation of molecules on a graphical display device. [IUPAC Computational]

molecular mimicry: Drug discovery & development glossary

molecular modeling, molecular modelling: A technique for the investigation of molecular structures and  properties using computational chemistry and graphical visualization techniques in order to provide a plausible three- dimensional representation under a given set of  circumstances. [IUPAC Medicinal Chemistry, IUPAC Computational]

The scope note for the Journal of Molecular Modeling includes the following subjects: computer- aided molecular design, rational drug design, de novo ligand design and receptor modeling, · application of computational and modeling methods in the field of medical chemistry, protein and peptide modeling, quantum chemistry, application of semi empirical, DFT and ab initio calculations, · prediction of biological activities (QSAR) and physico- chemical properties (QSPR), molecular mechanics/ dynamics simulation of polymers and biopolymers, genetic algorithms and neural nets, modeling of catalysts, advanced materials, and stationary phases in separation science, enhanced desktop computational tools for the life sciences visualisation, classification and handling of chemical data. htttp://link.springer.de/link/service/journals/00894/aims.htm

Molecular modeling cannot be better than the forces underlying simulations.. A modeler seeking to describe, for example, a protein for the first time, needs to often complement through quantum- chemical calculations force fields provided in existing programs. [Opportunities in Molecular Biomedicine in the Era of  Teraflop Computing: March 3 & 4, 1999, Rockville, MD, NIH Resource for Macromolecular Modeling and Bioinformatics  Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana- Champaign] http://www.ks.uiuc.edu/Publications/Reports/teraflop/node4.html

Molecular modeling applications use falls into two broad categories: interactive visualization and computational analyses. ... Three of the most prominent uses of modern molecular modeling applications are structure analysis, homology modeling, and docking ... in essence, objective modeling revolves around three different approaches (each based on different underlying physical and chemical theories): molecular dynamics, molecular mechanics, and quantum mechanics (In-depth). All of these are concerned with developing a unique solution to what is referred to as the "protein folding" problem - designing and testing algorithms and applications that will reliably predict 3-D structure from primary sequence. [Christopher Smith "Molecular Modeling - Seeing the Whole Picture with Modeling Software Packages" Scientist 12[17]:0, Aug. 31, 1998] http://www.the-scientist.com/yr1998/august/profile2_980831.html

Related terms computational chemistry, Computer Assisted Drug Design; molecular graphics, In-depth molecular dynamics, molecular mechanics.

Molecular modeling software includes AMBER, DOCK, MODELER, RasMol and many other programs. 

molecular models: Models used experimentally or theoretically to study molecular shape, electronic properties, or interactions; includes analogous molecules, computer generated graphics, and mechanical structures. [MeSH]

molecular recognition: Drug discovery and development glossary

ORF prediction: Related terms exon prediction, gene prediction, gene recognition.

peptidomimetic: Drug discovery & development glossary

phenomics: Omes & omics glossary

protein structure prediction:  Structural genomics glossary

quantum mechanics:

receptor mapping: The technique used to describe the geometric and/or electronic features of a binding site when insufficient structural data for this receptor or enzyme are available. Generally the active site cavity is defined by comparing the superposition of active to that of inactive molecules. [IUPAC Medicinal Chemistry, IUPAC Compendium]

 Over the past ten to fifteen years [before 1987], receptor mapping has expanded from a very minor technique, besieged by problems and limited in its approach, to one that is widespread, extended beyond receptors and applied to clinical problems and populations with modern imaging and scanning techniques. [MJ Kuhar "Imaging receptors for drugs in neural tissue"  Neuropharmacology 1987 Jul. 26 (7B): 911-6]

recognition site: Drug discovery and development glossary

scoring methods: Related term Sequencing glossary

simulated annealing SA: A procedure used in molecular dynamics simulations, in which the system is allowed to equilibrate at high temperatures, and then cooled down slowly to remove kinetic energy and to permit trajectories to settle into local minimum energy conformations.  [IUPAC Computational]   

simulations: Up until now, biomolecular simulations in drug design have been of limited use because of the short time scales, long turnaround times (implying poor sampling), the limited accuracy of simulations alluded to above, and the relatively small size of systems simulated when one wishes to account for proper inclusion of the physiological environment like membranes and solvent. Developing a new drug goes beyond finding binding compounds and must rely on good properties from the outset: activity, absorption, distribution, metabolism, excretion. Pharmacological researchers would like to predict these properties first, before one optimizes activity as conventionally done, and before analogs are made. ... When sufficient resources are available, simulations can determine the relative free energy values of drugs passing through membranes. These values are required to estimate the bioavailability of drugs. [Opportunities in Molecular Biomedicine in the Era of  Teraflop Computing March 3 & 4, 1999,  Rockville, MD, NIH Resource for Macromolecular Modeling and Bioinformatics  Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana- Champaign] http://www.ks.uiuc.edu/Publications/Reports/teraflop/node4.html

structural homology: Structural genomics glossary

structure analysis: The integration of gene identification and promoter recognition programs will be very important point for a complete gene structure analysis. [HGMP training course notes: "Gene Structure Prediction" Luciano Milanesi, I.T.B.A-CNR,  Italy, 1998] http://www.hgmp.mrc.ac.uk/Courses/GeneProteinID/milanesi/milanesi.htm

structure prediction problem: Structural genomics

VRML Virtual Reality Modeling Language: An open language under development [Web3D Consortium] http://www.web3d.org/vrml/vrml.htm

VRML was  supposed to be  the standard language for V[irtual] R[eality], but VRML  browsers and  plug- ins tend to be large. XML (Extensible  Markup  Language) is emerging as the most likely  alternative to or fix for VRML.  [Mike Hurwicz "Virtual Reality in VRML or XML?" Web Developer's Journal June 21, 2000]  http://www.webdevelopersjournal.com/articles/virtual_reality.html

van der Waals forces:  The attractive or repulsive forces between molecular entities (or between groups within the same molecular entity) other than those due to bond formation or to the electrostatic interaction of ions or of ionic groups with one another or with neutral molecules. ... The term is sometimes used loosely for the totality of nonspecific attractive or repulsive forces. [IUPAC Compendium]

virtual cells in silico: Rapid accumulation of biological data from genome, proteome, transcriptome and metabolome projects can bring us to the point where it is no longer purely speculative to discuss how to construct virtual cells in silico. This article describes attempts to construct whole cell models. The E-CELL project has completed a couple of virtual cell models, and computer simulations have revealed some biological surprises. [M. Tomita, "Whole- cell simulation: a grand challenge of the 21st century" Trends in  Biotechnology 19 (6): 205- 210, June 2001] . Related terms Omes & omics glossary metabolome, transcriptome

Virtual Cell, Dept of Plant Biology, Univ. of Illinois- Urbana Champaign, US http://www.life.uiuc.edu/plantbio/cell/

virtual library: Chemoinformatics glossary

virtual proteomics: See in silico proteomics

virtual screening: Selection of compounds by evaluating their desirability in a computational model. Also termed in silico screening. [IUPAC Combinatorial Chemistry]

visualization: Algorithms & data management glossary

Bibliography

[Tollenaere] JP, EE Moret, Hyperglossary of [Molecular Modelling in Drug Design] Terminology, Utrecht University, 1996. 150+ definitions. http://wwwcmc.pharm.uu.nl/webcmc/glossary.html

Alpha glossary index

IUPAC definitions are reprinted with the permission of the International Union of Pure and Applied Chemistry.

In-depth Molecular Modeling glossary

ab initio quantum mechanical methods (synonymous with nonempirical quantum mechanical methods): Methods of quantum mechanical calculations independent of any experiment other than the determination of fundamental observables. The methods are based on the use of the full Schrödinger equation to treat all the electrons of a chemical system. In practice, approximations are necessary to restrict the complexity of  the electronic wavefunction and to make its calculation possible. In this way methods of density functional theory are usually considered as ab initio quantum mechanical methods. [IUPAC Theoretical]

Methods of quantum mechanical calculations independent of any experiment other than the determination of  fundamental constants. The methods are based on the use of the full Schrödinger equation to treat all the electrons of a chemical system. In practice, approximations are necessary to restrict the complexity of the electronic wave function and to make its calculation possible. (Synonymous with non-empirical quantum mechanical methods.) [IUPAC Computational]

ab initio quantum mechanical modeling:  The application of ab initio modelling cross diverse fields such as condensed matter physics, materials science and chemistry has been demonstrated over the past 10 years. It has become clear that high quality simulations require a proper quantum mechanical treatment of the bonding and other interatomic forces, and techniques for achieving this are well established [1]. However, it is only recently that computational techniques have provided the means to directly solve the quantum mechanical equations for systems of sufficient complexity to provide useful information in a biological context.

The recent completion of the Human Genome Project will offer an unprecedented number of protein receptors and enzymes as targets for pharmacological intervention in disease processes. However, before this wealth of information can be used to develop pharmaceuticals, an understanding of the biochemistry of the newly identified proteins and their interactions must be obtained. First principles quantum mechanical modelling will play an important role in this process. It is important to achieve a mutual understanding, between scientists applying ab initio modelling and those working in the biological sciences, of the capabilities of ab initio modelling and the important biological problems to which they may be applied.  [Matthew Segall, Ursula Röthlisberger, Paolo Carloni, CECAM/Psi-k Workshop: Ab Initio Modelling in the Biological Sciences Lyon, France 11-13 June 2001] http://www.tcm.phy.cam.ac.uk/~mds21/Workshop2001/Scientific/node1.html#SECTION00010000000000000000

conformational analysis: Consists of the exploration of energetically favorable spatial arrangements (shapes) of a molecule (conformations) using molecular mechanics,  molecular dynamics, quantum chemical calculations or analysis of  experimentally- determined structural data, e.g., NMR or crystal structures.

Molecular mechanics and quantum chemical methods are employed to compute conformational energies, whereas systematic and random searches, Monte Carlo, molecular dynamics, and distance geometry are methods (often combined with energy minimization procedures) used to explore the conformational space. IUPAC Computational]

decoys: Potential energy functions to fold proteins are usually designed by a learning approach. A learning algorithm is presented with a large set of wrong shapes [decoys] and a few native sequences. The energy function is trained on the set to recognize the few correct folds and is used and tested on other proteins that were not included in the training set.

Clearly the quality of the design will be improved significantly if more decoys will be presented to the learning algorithm. The mechanism of the learning also makes a difference. It is useful to make the native fold the lowest energy state exactly. An exact solution makes it possible to increase the number of decoy structures without limit, constantly improving the energy function. [Opportunities in Molecular Biomedicine in the Era of  Teraflop Computing: March 3 & 4, 1999, Rockville, MD, NIH Resource for Macromolecular Modeling and Bioinformatics  Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana- Champaign] http://www.ks.uiuc.edu/Publications/Reports/teraflop/node4.html

energy function: Computationally, a shape is assigned to a protein sequence based on an empirical energy function. The lower the energy of a given structure, the more likely it is to be the correct fold. The structure prediction challenge is therefore divided into two: (1) The first challenge is the creation of many plausible folds or a set of structures that will include the native shape. The creation of the appropriate set depends on existing databases (such as the Protein Data Bank) or on the design of automated algorithms (using physical or statistical information) to generate plausible folds. Once the set is available, a selection procedure is used to ``fish'' out the correct fold. (2) The ``fishing'' of the plausible native shapes critically depends on the quality of the energy function. The value of the energy function must be the lowest for the native structure. Otherwise, a wrong structure is predicted. Therefore, the design of  appropriate empirical energy functions has attracted considerable attention and much research, using a variety of techniques and algorithms. Both challenges were addressed extensively in the last few years and while significant progress has been made we still do not have satisfactory solutions. The search of plausible structures is far from complete and native folds are missed. Moreover, current scoring (energy) functions assign energy values that are too high for many native shapes. Here we discuss the design of new folding energies and why this task requires significant enhancement in computer power. [Opportunities in Molecular Biomedicine in the Era of  Teraflop Computing: March 3 & 4, 1999, Rockville, MD, NIH Resource for Macromolecular Modeling and Bioinformatics  Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana- Champaign] http://www.ks.uiuc.edu/Publications/Reports/teraflop/node4.html

force field: A set of functions and parametrization used in molecular mechanics  calculations.  [IUPAC Computational]

Long-time simulations will pose a challenging benchmark for the force fields employed in molecular modeling. One question is, how will proteins and DNA that were described by the available force fields (and remained stable over nanosecond periods) behave in microsecond simulations? The high cost of long- time simulations will require that the issue is addressed in a systematic way by providing standard cases against which simulations can be tested ... Much effort has been spent over the past two decades to establish more and more faithful force fields. The primary concept behind most present day force fields is to cast them into analytical functions that are convenient and fast to evaluate. The price is that force fields are largely based on simple heuristics that stem from knowledge of the forces of atoms in various constellations of chemical bonds that have been used in physical chemistry for a long time, e.g., the Lennard- Jones potential. The difficulty in formulating force fields arises from accuracy as well as from variety since all constellations of chemical bonds in biopolymers need to be covered, requiring simplicity to keep the huge complexity in check and requiring consistency when force fields are amended. The latter is often achieved not solely on the basis of experimental information, but increasingly by use of quantum- chemical calculations. Presently, computational biologists hope to account also for the fact that the partial charges in biomolecules themselves can be altered through the presence of electric fields. The resulting polarization of biomolecules requires a new generation of empirical force fields that is under development. ...

The force fields do not cover the description of chemical reactions in which bonds are altered, nor do they apply to the behavior of biomolecules in electronically excited states or to novel constituents. To repair this deficiency systematically requires a combination of quantum-chemical calculations for the electronic degrees of freedom and classical simulations for the motion of the atomic nuclei .. .Teraflop computing speeds promise a dramatic improvement in this regard, permitting modelers to make more routine use of quantum-chemically improved force fields and ultimately computing [Opportunities in Molecular Biomedicine in the Era of  Teraflop Computing: March 3 & 4, 1999, Rockville, MD, NIH Resource for Macromolecular Modeling and Bioinformatics  Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana- Champaign] http://www.ks.uiuc.edu/Publications/Reports/teraflop/node4.html

Related term van der Waals.

molecular dynamics: A simulation procedure consisting of the computation of the motion of  atoms in a molecule or of individual atoms or molecules in solids, liquids and gases, according to Newton's laws of motion. The forces acting on the atoms, required to simulate their motions, are generally calculated using molecular mechanics force fields.  [IUPAC Computational]

The Parrinello group has applied ab initio Molecular Dynamics (MD) in which all forces were computed quantum- chemically to chemical reactions in general and to biological systems in particular, with results that compared favorably with experiment and older force field methods. The ab initio method was found to be of ``useful accuracy'' for simulations of biomolecules ... With a 1000 times faster computer (relative to 32 processors on a Cray T3E) the dynamics of a quantum- chemical system consisting of up to 10 atoms could be simulated for 10 s. For 100 quantum- chemically treated atoms 10 ns would be possible. Assuming linear scaling, 1000 quantum-chemically treated atoms could be followed for 1 ns. New possibilities which open up by using a teraflop computer and ab initio MD methods are the study of enzymatic reactions at the heart of many important biomolecules and the complete quantum-chemical description of small peptides. Problems such as RNA and DNA drug interactions, the determination of complex enzymatic reaction pathways, and the realistic study of electron transfer can be tackled. [Opportunities in Molecular Biomedicine in the Era of  Teraflop Computing: Report on a Meeting Held March 3 & 4, 1999 in Rockville, MD, Organized by the NIH Resource for Macromolecular Modeling and Bioinformatics  Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana- Champaign] http://www.ks.uiuc.edu/Publications/Reports/teraflop/node4.html

molecular mechanics: The calculation of  molecular conformational geometries and energies using a combination of empirical force fields (Burkert and Allinger, 1982).

Method of calculation of  geometrical and energy characteristics of  molecular entities on the basis of empirical potential functions (see force field) the form of  which is taken from classical mechanics. The method implies transferability of the potential  functions within a network of similar molecules. An assumption is made on "natural” bond lengths and angles, deviations from which result in bond and angle strain  respectively. Repulsive or attractive van der Waals and electrostatic forces between nonbonded atoms are also taken into account. Synonymous with force field method. [IUPAC Computational]

Monte Carlo technique: A simulation procedure consisting of randomly sampling the conformational space of a molecule. [IUPAC Computational] Broader term simulation

NIH Guide to Molecular Modeling Gateway: http://cmm.info.nih.gov/modeling/gateway.html

quantum chemical calculations: Molecular property calculations based on the Schrödinger equation, which take into account the interactions between electrons in the molecule. [IUPAC Computational]

semi-empirical methods: Molecular orbital calculations using various degrees of  approximation and using only valence electrons. [IUPAC Computational]

semi-empirical quantum mechanical methods: Use parameters derived from experimental data to simplify computations. The simplification may occur at various levels: simplification of the Hamiltonian (e.g. as in the Extended Hückel method), approximate evaluation of certain molecular integrals (see, for example, zero  differential overlap), simplification of the wave function (for example, use of p electron approximation as in Pariser-Parr-Pople). [IUPAC Computational]

 


Cambridge
Healthtech Institute
1037 Chestnut Street
Newton Upper Falls, Ma 02464
Phone:
617-630-1300
Fax:  617-630-1325
Email: chi@healthtech.com