You are here Glossary homepage/Search > Informatics
 
Informatics Overview
Emerging terminology for emerging technologies
Suggestions? Comments? Questions? mchitty@healthtech.com
Last revised December 26, 2001

Related glossaries include Research.

3D-QSAR Three-dimensional quantitative structure-activity relationships:  Involves the analysis of  the quantitative relationship between the biological activity of a set of compounds and their three-dimensional properties using statistical correlation methods. [IUPAC Computational]  Algorithms & data management glossary

bioinformatics: Store, manage, retrieve, analyze and integrate vast amounts of genomic data being produced globally. Today embraces protein structure analysis, gene and protein functional information, data from patients, pre- clinical and clinical trials and metabolic pathways of numerous species. [CHI Bioinformatics] http://www.chireports.com/content/reports/struc_gen.asp 

Bioinformatics: Beyond genome  June 4- 5, 2002, San Diego, CA  Bioinformatics Glossary

CORBA Common Object Request Broker Architecture: OMG's showcase specification for application interoperability, independent of platform, operating system, programming language - even of network and protocol  ...  integrates Enterprise Java Beans, and a new specification will provide the most robust support in the industry for application interoperability using XML. [OMG Specifications and  Process, June 2000] http://sisyphus.omg.org/gettingstarted/overview.htm  Computers & computing glossary

chemoinformatics: There are many sources of chemical data; registered chemical structures with stereochemistry, synthesis records, spectral data including NMR, purity determinations, not to mention the volume of data generated by HTS, SAR studies and the calculation of physiochemical properties. While gathering, storage and registration of data transforms it to information, it is accessibility, manipulation, and data mining of chemical information that translates it to knowledge for smarter drug development. This conference will showcase chemoinformatic tools for storage, design and mining of chemical databases/ information and present case examples of its success in lead identification and optimization. Chemoinformatics is about presenting and integrating a vast and complex array of information so that people who make the decisions in drug discovery can make the correct choices quickly and easily. Chemoinformatics May 6-8, 2002 Philadelphia PA  Chemoinformatics Glossary

Chemiinformatics and cheminformatics are alternate spellings. Chemoinformatics originally predominated, but cheminformatics now seems to be the most prevalent spelling. See FAQ question #2.

clinical informatics: The application of informatics approaches to the clinical- evaluation phase of drug development. These approaches can include clinical- trial simulations to improve trial design and patient selection, as well as electronic capturing and storing of clinical data and protocols. The goal is to reduce expenses and time to market. [CHI Bioinformatics]  Clinical genomics glossary

complexity: Currently there are more than 30 different mathematical descriptions of complexity. However we have yet to understand the mathematical dependency relating the number of genes with organism complexity. [JC Venter et. al Sequence of the Human Genome Science 291 (5507): 1347, Feb. 16, 2001]

An ill- defined term that means many things to many people. Complex things are neither random nor regular, but hover somewhere in between. Intuitively, complexity is a measure of  how interesting something is. Other types of complexity may be well defined. [Gary William Flake, Computational Beauty of Nature: Computer Explorations of Fractals, Chaos, Complex Systems, and Adaptation, MIT Press, 1998] http://mitpress.mit.edu/books/FLAOH/cbnhtml/glossary-C.html#complexity Genomics glossary

data mining: Nontrivial extraction of implicit, previously unknown and potentially useful information from data, or the search for relationships and global patterns that exist in databases. [Bob Klevecz "The Whole EST Catalog" Scientist 12 (2): 22 Jan 18 1999]

Narrower terms affinity based data mining, comparative data mining,  influence- based data mining, predictive data mining, time delay data mining, trends analysis data mining. Increasingly people are talking about text mining (including of the life sciences literature, as well as of sequence and structure databases). Algorithms & data management glossary

data pipelining:

databases: Collections of data in machine- readable form, which can be manipulated by software to appear in varying arrangements and subsets. Databases & software directory   Describes and provides links to around 200 databases and about 30 software tools. Related terms annotated databases, curated databases, federated databases, integrated databases, non- redundant databases, proprietary databases, redundant databases.  Bioinformatics glossary

determinism (genetic): Science's review of "The sequence of the human genome" (J. Craig Venter et al 291: 1304-1352 Feb. 16, 2001) concludes that a "paramount challenge awaits: public discussion of this information and its potential for improvement of personal health ... There are two fallacies to be avoided: determinism, the idea that all characteristics of the person are 'hard-wired" by the genome; and reductionism, the view that with complete knowledge of the human genome sequence, it is only a matter of time before our understanding of gene functions and interactions will provide a complete causal description of human variability."  Clinical genomics glossary  Related terms Genetic variations glossary

Gene OntologyTM GO:  The goal of the Gene OntologyTM Consortium is to produce a dynamic controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.  http://www.geneontology.org/  Participating Groups include Arabidopsis, C. elegans, Drosophila, Saccharomyces and mouseFunctional genomics glossary

genome informatics: The Twelfth International Conference on Genome Informatics (GIW 2001) focuses on Genome Informatics, including, but not limited to, the following areas: genomic database, knowledge extraction from literature, knowledge discovery and data mining from databases, structural genomics, protein structure and function prediction, proteome analysis, pathway analysis, functional genomics, gene expression analysis, gene network analysis, gene structure and function prediction, sequence analysis, motif extraction and search, multiple alignment, phylogenetic tree, linkage analysis program, systems for supporting experimental works (mapping, sequencing, primer design, etc.), high performance computing, simulation of biological system, DNA computing, artificial life, etc. [GIW 2001 homepage, Dec. 17-19, 2001, Tokyo, Japan] http://giw.ims.u-tokyo.ac.jp/giw2001/

in silico: Literally "in the computer" (as contrasted with "in vitro" (in glass) or "in vivo" (in life). Can be used to screen out compounds which are not druggable. Molecular modeling glossary  Related term rules of five  Chemoinformatics glossary

in silico biology: Advances in genomics and proteomics have greatly improved our knowledge of the components of biological systems at the molecular level. The next logical step is to try to understand how these components interact well enough to model those biological systems in silico. This conference will showcase examples and applications of computational modeling of cells, tissues, and disease. Faced with an overabundance of potential targets such models offer the promise of improved target prioritization compared with relying on empirical research alone. While such models are far from being a complete representation of a biological system, examples are already emerging where this method has aided in a greater understanding of a disease state as well as target prioritization and ultimately drug development. Anyone interested in utilizing in silico methods as a valuable tool for development of therapeutics strategies should attend this event. In Silico Biology: Modeling Systems Biology for Research and  Target Prioritization  June 2- 3, 2002  San Diego, CA  Molecular modelling glossary

informatics: The study of the application of computer and statistical techniques to the management of information. In genome projects, informatics includes the development of methods to search databases quickly, to analyse DNA sequence information, and to predict protein sequence and structure from DNA sequence data. [ORD] Narrower terms bioinformatics, chemoinformatics, clinical informatics, molecular informatics, protein informatics, pharmacoinformatics, research informatics. Computers & computing glossary

information overload: Biomedicine is in the middle of revolutionary advances. Genome projects, microassay methods like DNA chips, advanced radiation sources for crystallography and other instrumentation, as well as new imaging methods, have exceeded all expectations, and in the process have generated a dramatic information overload that requires new resources for handling, analyzing and interpreting data. Delays in the exploitation of the discoveries will be costly in terms of health benefits for individuals and will adversely affect the economic edge of the country. [Opportunities in Molecular Biomedicine in the Era of  Teraflop Computing: March 3 & 4, 1999, Rockville, MD, NIH Resource for Macromolecular Modeling and Bioinformatics  Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana- Champaign] http://www.ks.uiuc.edu/Publications/Reports/teraflop/node4.html

"Information overload" is not an overstatement these days.  One of the biggest challenges is to deal with the tidal wave of data, filter out extraneous noise, and assimilate and integrate information on a previously unimagined scale. Related terms federated databases,  integrated databases. Bioinformatics glossary

interoperability: Proposed working definition. Interoperability can be defined as: the ability of different types of computers, networks, operating systems, and applications to work together effectively, without prior communication, in order to exchange information in a useful and meaningful manner. [William E. Moen, University of North Texas School of Library and Information Sciences, Z3950 project, 2000]  http://www.unt.edu/wmoen/Z3950/GIZMO/section4.htm  

Enabling heterogeneous databases to function in an integrated way, sometimes refers to cross platform functionality and operability across relational, object- oriented, and non- standard types of databases. Computers & computing glossary

metadata: Anyone who has used a web search service like AltaVista or HotBot knows that typing in a few keywords and receiving a couple of thousand "hits" is not necessarily very useful. A lot of manual "weeding" of information has to happen after that; it may also happen that the keywords for which you are searching are not prominent in the relevant document itself.

A possible solution for the search problem - and for the general issue of letting automated "agents" roam the web performing useful tasks - is to provide a mechanism which allows a more precise description of things on the web. This, in turn, could elevate the status of the web from machine- readable to something we might call machine- understandable.

Metadata is "data about data" or specifically in our current context "data describing web resources." The distinction between "data" and "metadata" is not an absolute one; it is a distinction created primarily by a particular application ("one application's metadata is another application's data"). [W3C, "Introduction to RDF Metadata" 1997] Related term RDF http://www.w3.org/TR/NOTE-rdf-simple-intro

Information about data that enables intelligent, efficient access and management of data. … metadata is always less than the data.  [Robyne M. Sumpter  “White paper on Data Management” Lawrence Livermore National Laboratory, February 10, 1994] http://www.llnl.gov/liv_comp/metadata/papers/whitepaper-draft.html  Bioinformatics glossary

molecular informatics: Deals with representation, storage, retrieval, processing, and exchange of information about molecules, including biological macromolecules. Currently a significant portion of molecular information is accessible via  WorldWideWeb.  However lack of standards for the representation and exchange, centralized versus local storage dilemma, different access mode to the commercial and public databases hinder creation of universal digital libraries for molecular information. [Iosif Vaisman, Lab for Molecular Modeling, School of Pharmacy, Univ. of North Carolina - Chapel Hill, US "Molecular informatics and World Wide Web, 1995]  http://www.ibiblio.org/pharmacy/conf/molinf.html Computers & computing glossary

molecular modeling: A technique for the investigation of molecular structures and properties using computational chemistry and graphical visualization techniques in order to provide a plausible three-dimensional representation under a given set of circumstances. [IUPAC Medicinal Chemistry] Molecular Modeling glossary

nonlinear: Advances in genomic technologies are a mix of incremental improvements to existing technologies (linear) and occasionally, a truly new paradigm or breakthrough.  See also disruptive technologies, emerging technologies and complex. Technologies & instrumentation overview

normalization: In creating a database, normalization is the process of organizing it into tables in such a way that the results of using the database are always unambiguous and as intended. Normalization may have the effect of duplicating data within the database and often results in the creation of additional tables. (While normalization tends to increase the duplication of data, it does not introduce redundancy, which is unnecessary duplication.) Normalization is typically a refinement process after the initial exercise of identifying the data objects that should be in the database, identifying their relationships, and defining the tables required and the columns within each table. [whatis.com]  Algorithms & data management glossary

ontology: The main purpose of an ontology is to enable communication between computer systems in in a way that is independent of the individual system technologies, information architectures and application domain.

The key ingredients that make up an ontology are a vocabulary of basic terms and a precise specification of what those terms mean.

The term 'ontology' has been used in this way for a number of years by the artificial intelligence and knowledge representation community, but is now becoming part of the standard terminology of a much wide community including object modelling and XML. [Ontology.org "What is an ontology?", 2000]   http://www.ontology.org/main/papers/faq.html  

From the Greek onto "on being". Metaphysics, nature and essence of existence. [OED] Narrower terms bio- ontology, Gene Ontology TM, molecular biology ontology   Computers & computing glossary

pharmainformatics: The multidisciplinary informatics needs of the pharmaceutical industry (HTS High Throughput Screening) data, combinatorial chemistry, ADME informatics, cheminformatics, toxicology, etc. information access and communication between various departments like the development and discovery teams. [CCL [Computational Chemistry List] call for papers, Spring ACS [American Chemical Society] meeting in San Diego (April 1-5, 2001) Sponsored by the Biotechnology Secretariat (BTEC) Co-sponsored by Chemical Information Division (CINF)]  Drug discovery & development glossary http://www.quimica.urv.es/~bo/llistes/CCL/100/10/msg00081.html

protein informatics: Includes bioinformatics technology to cross reference protein informatics with genomic databases, sequence data of protein fragments by mass spectrometry and identification of these fragments using more remote relationships; construction and management of international protein structural databases; protein profiling and characterization data handling; data that elucidates the relationship between structures and functions of biological macromolecules by X-ray crystallography, large scale molecular simulation and structural bioinformatics, protein structure data handling and storage, structural bioinformatics covering molecular modeling and design; protein array and chip data handling; development of new algorithms and software for large scale simulation calculations by parallel computers; protein- protein interaction data and libraries; protein structure data determination by X-ray crystallography and development of automatic analysis systems; protein expression databases; automated technology for high- throughput protein function assignment and annotation Protein informatics November 12-13, 2001, San Diego, CA  Proteomics glossary

research informatics: The explosion of genomic information, from sequences and gene expression to SNPs and protein structures, is of limited value for pharmaceutical researchers without powerful software capable of interpretation and comparisons. Case studies and experiences that companies have with both the problems and solutions in the areas of data mining, multiple location data sharing, and computational enhancements of biological and chemistry projects, as well as integration of these efforts, served as the focus of this meeting. Different approaches for overcoming the problems of legacy information systems, the very different language and perspectives of chemists and biologists, and the organizational issues of compartmentalization were among the key topics discussed.  Research Informatics  Nov. 29-30, 2000 Research glossary

semantics: Related terms controlled vocabularies, ontologies, taxonomies

standards: See Bio-ontology Standards Group, Data Model Standards Group Bioinformatics glossary; data analysis, standards Microarrays glossary

taxonomies: Taxonomy (from Greek taxis meaning arrangement or division and nomos meaning law) is the science of classification according to a pre- determined system, with the resulting catalog used to provide a conceptual framework for discussion, analysis, or information retrieval. In theory, the development of a good taxonomy takes into account the importance of separating elements of a group (taxon) into subgroups (taxa) that are mutually exclusive, unambiguous, and taken together, include all possibilities. In practice, a good taxonomy should be simple, easy to remember, and easy to use.

One of the best known taxonomies is the one devised by the Swedish scientist, Carl Linnaeus, whose classification for biology is still widely used (with modifications). In Web portal design, taxonomies are often created to describe categories and subcategories of topics found on the Web site. The categorization of words on whatis.com is similar to any Web portal taxonomy [whatis.com]

Frustrations with search engine and information retrieval have led to increased interest in specialized taxonomies. A form of controlled vocabulary, hierarchical relationships (broader terms, narrower terms) provide additional suggestions for browsing, as do lateral relationships (related terms) and preferred terms.  While there is theoretical interest in natural language processing, a very small percentage of web search engine searching actually uses natural language processing successfully. See also FAQ question #3.  Computers & computing glossary

Bibliography

Alpha glossary index

IUPAC definitions are reprinted with the permission of the International Union of Pure and Applied Chemistry.


Cambridge
Healthtech Institute
1037 Chestnut Street
Newton Upper Falls, Ma 02464
Phone:
617-630-1300
Fax:  617-630-1325
Email: chi@healthtech.com