bioon.com 生物谷
生物谷RSS 生物谷手机WAP浏览支持
专业平台生物 | 产业 | 药学 | 医学 | 视频 | 实验 | 健康 | 图谱 | 考试 | 招聘 | 社区 | VIP | English
企业服务产品平台 | 仪器大全 | 供求信息 | 试剂大全 | 会议会展 | 黄页 广告 | 服务 | 生意通 | E-solution
个人服务彩信 | 继续教育 | 博客 | 书库 | 求职 | 网址导航 | 下载 | 论坛 | 投稿 | TILS
您现在的位置: 生物谷 >> 生物 >> 生物学文集 >> 生物频道正文
rss

早期的生物信息学:一个学科的诞生

近年来,生物信息学经历了快速的发展阶段,,然而这个“新的”学科也有一个很长的历史了。作者以自己的视点回顾了在这个新学科起始及随后的发展中一些算法的问题及解答,以及生物信息学的基础的理论等等。本文以70年代,80年代,10年前三个时间点,五个时间部分对生物信息学的发展作了详细的回顾。
Early bioinformatics: the birth of a discipline—a personal view

Christos A. Ouzounis1, and Alfonso Valencia2 Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB10 1SD, UK, 2Protein Design Group, National Center for Biotechnology, CNB-CSIC Campus U. Autonoma Cantoblanco, Madrid
28049, Spain.
Received on December 13, 2002; revised on May 25, 2003; accepted on March 28, 2003

ABSTRACT

Motivation: The field of bioinformatics has experienced an explosive growth in the last decade, yet this ‘new’ field has a long history. Some historical perspectives have been previously
provided by the founders of this field. Here, we take the opportunity to review the early stages and follow developments of this discipline from a personal perspective.
Results: We review the early days of algorithmic questions and answers in biology, the theoretical foundations of bioinformatics, the development of algorithms and database resources and finally provide a realistic picture of what the field looked like from a resources and finally provide a realistic picture of what the field looked like from a practitioner’s viewpoint 10 years ago, with a perspective for future developments.
Contact: ouzounis@ebi.ac.uk

PRELUDE

The recent revolution in genomics and bioinformatics has caught the world by storm. From company boardrooms to political summits, the issues surrounding the human genome, including the analysis of genetic variation, access to genetic information and the privacy of the individual have fueled public debate and extended way beyond the scientific and technical literature. During the past few years, bioinformatics, defined as the computational handling and processing of genetic information, has become one of the most highly visible fields of modern science. Yet, this ‘new’ field has a long, even humble, history, along with the triumphs of molecular
genetics and cell biology of the last century.

Taking a historical perspective, we will examine the birth of this discipline, and some of the factors that shaped it into one of the hottest areas of frantic scientific research and technical development. First, we will attempt to describe briefly some key developments for computational biology, from the very early days to the close of the century. Second, we will compare some ‘early’ bioinformatics activities of justten years ago with today’s field, hoping that we provide a perspective for the future. Clearly, our account is a personal perspective and by no means an objective treatise on the history of bioinformatics. Yet, we hope that this will provide a basis for further discussion and debate, enriched by personal interviews, a detailed citation analysis and a more wide coverage of the different areas within a field. For instance, we have not covered sufficiently entire areas of biological computation, such as structural bioinformatics (X-ray crystallography, electron microscopy and nuclear
magnetic resonance), modelling and dynamics, including image and signal analysis (regulatory and gene networks, physiological simulations, metabolic control theory, tissue visualization via tomography and nuclear magnetic imaging) or neurobiology and neuroinformatics (neural networks, control theory). These fields are outside the scope of our review and at the borders of biological computing with other important areas of research. We would like to make clear that we focus on our own area of expertise and discuss the milestones of the field of protein sequence and structure analysis while attempting to provide a general overview of the major achievements in bioinformatics. We list a number of institutions and key papers (Tables 1 and 2) that were influential in our own intellectual development and thus should not be considered as an objectively derived ‘hall of fame’ in this field. We hope that this treatise  illinspire other scientists to take an opportunity and provide their own perspectives for the history of computational biology.

THE PRE-70’S: PIONEERING COMPUTATIONAL STUDIES

It could be argued that some of the most fundamental problems in the early days of molecular biology presented some formidable algorithmic problems. In that sense, the structure of DNA (Watson and Crick, 1953), the encoding of genetic information for proteins (Gamow et al., 1956), the factors governing protein structure (Anfinsen, 1973; Pauling et al., 1951), the structural properties of protein molecules (Anfinsen and Scheraga, 1975; Crick, 1953; Pauling and Corey, 1953; Szent-Györgyi and Cohen, 1957), the evolution of biochemical pathways (Horowitz, 1945) and gene regulation (Britten and Davidson, 1969), and the chemical basis for
development (Turing, 1952) all contain seeds of some of the problems that were possible to address by computation in the following decades. In parallel, much of fundamental computer
science, including the theory of computation (Chaitin, 1966) and information theory (Shannon and Weaver, 1962), the definition of grammars (Chomsky, 1959) and random strings (Martin-Löf, 1966), the theory of games (Neumann and Morgenstern, 1953) and cellular automata (Neumann,
1966) emerged during the 1950s and 1960s.

These early approaches had already been combining computational and experimental information to better understand biological macromolecules, and insights were gained on the evolution of genes and proteins (Ingram, 1961; Margoliash, 1963; Zuckerkandl and Pauling, 1965b), the issues of molecular homology (Florkin, 1962; Zuckerkandl and Pauling, 1965a), the analysis of molecules to unveil evolutionary patterns (Zuckerkandl and Pauling, 1965b), the structural constraints of polypeptide chains (Ramachandran et al., 1963), the informational properties of DNA (Gatlin,
1966) and protein sequences (Nolan and Margoliash, 1968), the origins of the genetic code (Crick, 1968;Woese, 1970), its coding capacity (Alff-Steinberger, 1969) and the accuracy of
the translation process (Crick, 1966), the construction of phylogenetic trees (Fitch and Margoliash, 1967), the use of molecular graphics (Katz and Levinthal, 1966), properties
of protein sequence alignment (Cantor, 1968) and the processes of molecular evolution (Kimura, 1968; Nei, 1969).
 
This era can be considered as the birth of computational biology, with a number of key developments appearing: the first sequence alignment algorithms (Gibbs and McIntyre, 1970;
Needleman and Wunsch, 1970), models for selection-free molecular evolution (King and Jukes, 1969), the preferential substitution of amino acid residues in protein sequences (Clarke, 1970; Epstein, 1967), formal studies of protein primary structure (Krzywicki and Slonimski, 1967), derivation of preferences for amino acid residues in secondary structures (Pain and Robson, 1970; Ptitsyn, 1969), the invention of the helical wheel representation for protein sequences
(Dunnill, 1968; Schiffer and Edmundson, 1967), the widespread use of molecular data in evolutionary studies (Fitch and Margoliash, 1970; Jukes, 1969), the origins of life (West
and Ponnamperuma, 1970) and the theory of evolution by gene duplication (Ohno, 1970). In 1970, the central dogma had also been conceived (Crick, 1970), after the seminal discoveries of
the processes of RNA transcription and translation.
 
THE 70’S:THE THEORETICAL FOUNDATIONS

As a consequence of the above, an agenda for computational problems in molecular biology had already been formulated. Studies of substitution mutation rates (Koch, 1971), the calculation
of solvent accessibility on protein structures (Lee and Richards, 1971), the parsimonial determination of tree topology (Fitch, 1971), RNA structure prediction (Tinoco et al.,
1971) and more methods for sequence alignment (Beyer et al., 1974; Gibbs et al., 1971; Grantham, 1974; Sackin, 1971; Sellers, 1974a; Wagner and Fischer, 1974) have appeared.
One of the most prominent theoretical advancements of this time was the merging of classical population genetics with molecular evolution (Kimura, 1969; Ohta and Kimura, 1971), to produce the theory of neutral evolution (Kimura, 1983) and the constancy of the evolutionary rate of proteins (Jukes and Holmquist, 1972), also known as the molecular clock hypothesis (Kimura and Ohta, 1974). Another area of intensifying research was the string comparison problem in
computer science (Levin, 1973; Sankoff and Sellers, 1973; Wagner and Fischer, 1974) (or ‘sequence alignment’ in biology), developed hand-in-hand with applications to biological
macromolecules (Beyer et al., 1974; Gordon, 1973; Kimura and Ohta, 1972; Sankoff, 1972; Sankoff and Cedergren, 1973; Sellers, 1974b). At the same time, the first phylogenetic analyses of macromolecular families(Wu et al., 1974), including immunoglobulins (Novotny, 1973) and transfer RNA (Holmquist et al., 1973), were emerging. Moreover, refined attempts to define sequence patterns that influence protein structure continued to propagate (Kabat andWu, 1973; Liljas and Rossman, 1974; Richards, 1974; Robson, 1974; Schulz et al., 1974; Wetlaufer, 1973).
 
By the mid-1970s, a pretty clear picture has been devised for the theory and practice of sequence alignment, the process of molecular evolution, the quantification of nucleotide andaminoacid substitution rates, the construction of evolutionary trees, and secondary/tertiary protein structure analysis. In certain ways, a lot of the problems that would occupy the computational biologists of the future had been defined during those early years. What was missing is central reference data and software resources and the means to access them, a
significant trend that would emerge very prominently during the next decade.

In the last years of that decade, a flurry of activity occurred in the development of string and sequence alignment theory (Aho et al., 1976; Chvátal and Sankoff, 1975; Delcoigne and Hansen, 1975; Hirschberg, 1975; Lowrance and Wagner, 1975; Okuda et al., 1976; Waterman et al., 1976) and evolutionary tree analysis and construction (Felsenstein, 1978; Klotz et al., 1979; Sattath and Tvertsky, 1977; Waterman and Smith, 1978a; Waterman et al., 1977), as well as the
description, visualization, analysis and prediction of protein structure, in an attempt to crack the ‘second genetic code’, the protein folding problem (Chothia, 1975; Chothia et al., 1977; Chou and Fasman, 1978; Crippen, 1978; Garnier et al., 1978; Hagler and Honig, 1978; Jones, 1978; Kabsch, 1976; Karplus and Weaver, 1976; Kuntz, 1975; Levitt, 1976, 1978; Levitt
and Chothia, 1976; Levitt and Warshel, 1975; Lifson and Sander, 1979; Matthews, 1975; Nagano and Hasegawa, 1975; Richards, 1977; Richardson, 1977; Rose, 1979; Rossmann and Argos, 1976; Schulz, 1977; Schulz and Schirmer, 1979; Sternberg and Thornton, 1978; Tanaka and Scheraga, 1975;Ycas et al., 1978), including the first algorithms for secondary structure prediction (Chou and Fasman, 1974; Lim,1974), the invention of distance geometry for the calculation
of structure from distance constraints (Crippen, 1977)and further use of specialized systems for molecular graphics and modelling (Feldmann, 1976). An interesting by-product in this area were the evolutionary ‘stories’ for specific protein families, such as the selection-dependent evolution of haemoglobins (Goodman et al., 1975), the dehydrogenases and kinases (Eventoff and Rossman, 1975), cytochrome c (Fitch, 1976) and the first analyses of metabolism, such as the loss of metabolic capacities (Jukes and King, 1975), the evolution of catalytic efficiency (Albery and Knowles, 1976), the evolution of energy metabolism (Dickerson et al., 1976) and the simulation of metabolic regulation (Heinrich and Rapoport, 1977). Other emerging problems were the exon– intron question (Gilbert, 1978), the evolution of the bacterial
genome (Riley and Anilionis, 1978), RNAstructure prediction (Waterman and Smith, 1978b), deep phylogeny (Schwartz and Dayhoff, 1978) and the complex control of morphogenesis (Savageau, 1979a,b).
 
One key development towards the end of that decade regarding public resourceswas the compilation of computer archives for the storage, curation and distribution of protein sequence
(Dayhoff, 1978) and structure (Bernstein et al., 1977) information, a trend that would be amplified enormously in the immediate future.
 
THE 80’S: MORE ALGORITHMS AND RESOURCES

The following decade was in effect the time when the field of computational biology took shape as an independent discipline, with its own problems and achievements. For the first time, efficient algorithms were developed to cope with an increasing volume of information, and their computer implementations were made available for the wider scientific community. Some commercial activity around software development has already been observed (Devereux et al.,
1984). Due to the vast volume of literature, we will only cite a limited number of significant papers that represent key developments in computational biology. We will also break down the field into four subfields: (i) sequence analysis, (ii) molecular databases, (iii) protein structure prediction and (iv) molecular evolution.
 
By 1980, it had already become clear that computer analysis of nucleotide sequences was essential for the better understanding of biology (Gingeras and Roberts, 1980). Sequence
comparison continued to benefit from parallel developments in computer science (Hall andDowling, 1980). The dot-matrix model of sequence comparisonwas well developed at that time
(Maizel and Lenk, 1981). The genome hypothesis for preferential codon usage was formulated on the basis of computer analysis (Grantham et al., 1980). Progress in DNA (Trifonov and Sussman, 1980) and RNA(Nussinov and Jacobson, 1980) structure analysis prediction was also reported. Other theoretical work at the turn of that decade included key analyses of the evolution of prokaryotes with the identification of the Archaea as a separate domain of life (Fox et al., 1980), the notion of selfish DNA (Doolittle and Sapienza, 1980) and variable modes of molecular evolution (Dover and Doolittle, 1980). Other fields with influence on computational biology
were neural networks (Hopfield, 1982), molecular computing (Conrad, 1985), nanotechnology (Drexler, 1981), complexity and cellular automata (Burks and Farmer, 1984; Reggia et al.,
1993; Wolfram, 1984) and the theory of clustering (Shepard, 1980), all of which had a direct impact on protein structure prediction and design as well as sequence database searching
and clustering.
 
(i) Theoretical developments in sequence analysis, for example the computation of evolutionary distances (Sellers, 1980) or approximate string matching (Ukkonen, 1985), were followed by the development of key algorithms, such as the Smith–Waterman dynamic programming sequence alignment algorithm (Smith and Waterman, 1981a,b) and the FASTA family of algorithms for database searching (Lipman and Pearson, 1985; Wilbur and Lipman, 1983). Similarly, analysis of
repeats in theoretical computer science (Guibas and Odlyzko, 1980; Steele, 1982) was followed by parallel analyses for biological sequences (DeWachter, 1981; Martinez, 1983; Nussinov, 1983). Matrix-based models of sequence comparison continued to be developed (Fristensky, 1986; Novotny,1982), as well as the first integrated sequence analysis systems (Brutlag et al., 1982; Lyall et al., 1984; Pustell and Kafatos, 1984; Staden, 1982). Two major developments were
the automation and wide use of multiple sequence alignment (Carrillo and Lipman, 1988; Feng et al., 1985; Hogeweg and Hesper, 1984; Murata et al., 1985; Sankoff and Cedergren, 1983), especially the tree-based alignment method (Feng and Doolittle, 1987; Higgins and Sharp, 1988), and sequence profile analysis (Gribskov et al., 1987, 1988). One of the first applications of sequence analysis to the discovery of important protein motifs was the identification of the ATP-binding motif in various functionally unrelated proteins (Walker et al., 1982), the zinc-finger motif (Klug and Rhodes, 1987), the leucine-zipper motif (Landschulz et al., 1988), the homology of bacterial sigma factors (Gribskov and Burgess, 1986) and the nature of signal sequences (Heijne, 1981, 1985). Other studies included optimality in sequence alignment (Altschul and Erickson, 1986; Fickett, 1984; Fitch and Smith, 1983; Waterman, 1983), rigorous statistical approaches in sequence analysis (Arratia et al., 1986; Arratia andWaterman, 1985a,b; Karlin et al., 1983; Tavaré, 1986; Wilbur and Lipman, 1984), pattern recognition in several sequences and consensus generation (Abarbanel et al., 1984; Sellers, 1984; Waterman
et al., 1984) random sequences (Fitch, 1983), sequence logos (Schneider et al., 1986), and syntactic analysis (Ebeling and Jiménez-Montaño, 1980; Jiménez-Montaño, 1984). One issue
was the performance of these computation-intensive programs on small computer systems (Gotoh, 1987; Korn and Queen, 1984). Algorithms for the prediction of antigenic determinants (Hopp andWoods, 1981), the detection of open reading frames (Fickett, 1982; Shepherd, 1981; Staden and McLachlan, 1982) and translation initiation sites (Stormo et al., 1982), the computation of RNA folding (Dumas and Ninio, 1982; Turner et al., 1988) and the calculation of evolutionary
trees (Felsenstein, 1982) were also invented. The first reviews (Goad, 1986; Hodgman, 1986; Jungck and Friedman, 1984; Kruskal, 1983; Kruskal and Sankoff, 1983) and books
(Doolittle, 1986; Heijne, 1987; Rawlings, 1986) on sequence analysis and comparison also appeared at this time. 
 
(ii) The initial phase of database development for data quality control and collection rapidly progressed (Kelly and Meyer, 1983; Orcutt et al., 1983), with the appearance of at least two major resources for nucleotide data submission (Philipson, 1988), GenBank (Bilofsky et al., 1986) and the EMBL Data Library (Hamm and Cameron, 1986). Proposals for computer networks that ensured availability and facilitated distribution (Lesk, 1985; Lewin, 1984) were materialized,
with initiatives such as EMBNET (Lesk, 1988) and BIONET (Kristofferson, 1987; Smith et al., 1986). Archives of molecular biology software also appeared, for example the LiMB
software catalog (Burks et al., 1988; Lawton et al., 1989). Various reviews summarizing strategies for sequence database searching were published (Cannon, 1987; Davison, 1985;
Henikoff andWallace, 1988; Lawrence et al., 1986; Orcutt and Barker, 1984; Thornton and Gardner, 1989), indicating that distributed computing for the wider community was coming
of age (Heijne, 1988). Entire programs in various institutes such as EMBL formed the very first departments exclusively devoted to computational biology (Lesk, 1987). Finally, experimentation with various dedicated hardware platforms for more efficient analysis of biological sequences emerged (Collins and Coulson, 1984; Core et al., 1989; Edmiston et al., 1988; Gotoh and Tagashira, 1986; Huang, 1989; Lopresti, 1987) along with relational database technology that facilitated querying (Islam and Sternberg, 1989; Rawlings, 1988), as databases continued to grow at an exponential rate (DeLisi,1988).
 
(iii) The field of protein structure analysis and prediction experienced a significant growth in that decade. Various approaches to protein structure representation and visualization
were explored, including the derivation of coordinates from stereo diagrams (Rossmann and Argos, 1980), domain definitions (Rashin, 1981), hydrophobicity plots (Kyte and Doolittle, 1982; Sweet and Eisenberg, 1983) and moments (Eisenberg et al., 1984), automatic structure drawing (Lesk and Hardman, 1982), fractal surfaces (Brooks and Karplus, 1983), signed distance maps (Braun, 1983), solvent accessible surfaces (Connolly, 1983), vector representations of protein sequences (Swanson, 1984) and structures (Yamamoto and Yoshikura, 1986), substructure dictionaries (Jones and Thirup, 1986), amino acid conservation patterns (Taylor, 1986), differential geometry (Rackovsky and Goldstein, 1988) sequence motifs (Rooman and Wodak, 1988) and building blocks (Unger et al., 1989). Interactive computer graphics were introduced
as well, with programs such as FRODO (Jones, 1985) and RIBBON (Priestle, 1988). Structure comparison was further developed, with new analyses and algorithms (Cohen and Sternberg, 1980a; McLachlan, 1982; Sippl, 1980; Taylor and Orengo, 1989). Class prediction as a filtering step in protein structure prediction was also invented at that time (Klein, 1986; Klein and DeLisi, 1986; Nishikawa et al., 1983a,b). Molecular modelling was developed (Greer, 1981), further
validated with dictionaries of peptides (Kabsch and Sander, 1984) [and ultimately fully automated (Holm and Sander, 1992; Levitt, 1992) in the 1990s]. The problem of threading
sequences to structures was also introduced (Ponder and Richards, 1987). Descriptive studies deriving architectural principles of protein structure (Chothia, 1984; Richardson, 1981b) from statistical analysis of specific families and folds continued to increase in quantity and sophistication (Brändén, 1980; Janin and Chothia, 1980; Lifson and Sander, 1980; Ptitsyn and Finkelstein, 1980; Weber and Salemme, 1980)—examples include analyses of disulfide bridges (Thornton, 1981), beta-sheet sandwiches (Cohen et al., 1981), helix packing patterns (Chothia et al., 1981) and beta-sheets (Chothia and Janin, 1981), beta-hairpins (Sibanda and Thornton, 1985), beta-barrels (Lasters et al., 1988), loops (Leszczynski and Rose, 1986) and coiled-coils (Cohen and Parry, 1986). The recent discovery of exons led to their mapping on known protein
structures (Craik et al., 1982, 1983; Gô, 1981, 1983, 1985). The development of NMR allowed the solution of protein structures (Wüthrich, 1989), and presented new problems (Braun, 1987), the calculation of 3D coordinates from distance data: distance geometry (Gower, 1982, 1985) and
molecular dynamics (Brünger et al., 1986) came to the rescue. These methods were previously used to approach the protein folding problem as prediction methods, with the use of distance constraints (Cariani and Goel, 1985; Cohen and Sternberg, 1980b; Galaktionov and Rodionov, 1981; Goel et al., 1982; Goel and Ycas, 1979; Kuntz et al., 1976; Wako and Scheraga, 1981, 1982) and the prediction of residue contacts (Miyazawa and Jernigan, 1985; Warme and Morgan,
1978) as well as restrained energy minimization and molecular dynamics (Levitt, 1983). Development of distance geometry continued (Braun, 1987; Braun and Gô, 1985; Crippen, 1987;
Easthope and Havel, 1989; Hadwiger and Fox, 1989; Havel et al., 1983a,b; Havel and Wüthrich, 1984; Metzler et al., 1989; Sippl and Scheraga, 1985).
 
(iv) Protein evolution had also become a key area of research(Bajaj and Blundell, 1984; Dayhoff et al., 1983; Doolittle,1981), with a number of interesting discoveries such as the coordinated changes of key residues (Altschuh et al., 1988), the relationship between the divergence of sequence and structure (Chothia and Lesk, 1986), the properties of similarity matrices (Wilbur, 1985), the influence of amino acid composition (Graur, 1985), the definition of homology (Reeck et al., 1987), the detection of protein fold determinants (Bashford et al., 1987) and the identification of sequence similarities due to convergence (Doolittle, 1988; Fitch, 1988). Key analyses of individual protein families with wider implications for protein sequence/structure relationships included the analysis of the globins (Lesk and Chothia, 1980), the blue-copper
proteins (Chothia and Lesk, 1982), the immunoglobulins (Lesk and Chothia, 1982), the proteases (Neurath, 1984), the cytochromes (Mathews, 1985), the bacterial ferredoxins (George et al., 1985), the superoxide dismutases (Getzoff et al., 1989; Lee et al., 1985), the phosphorylases (Hwang and Fletterick, 1986), the ribonucleases (Beintema et al., 1988), the crystallins (Lubsen et al., 1988; Piatigorsky and Wistow, 1989) and other various case studies (Brenner, 1988; Doolittle, 1985; Goldfarb, 1988). Correspondingly, the analysis of phylogenetic markers such as rRNA(Rothschild et al., 1986; Sogin et al., 1986), exons and introns (Gilbert, 1985)
and various genome segments (Brutlag, 1980) resulted in significant discoveries for genome evolution, such as the relationships of life forms (Cedergren et al., 1988; Iwabe et al., 1989; Pace et al., 1986; Woese, 1987), the dynamics of DNA (Breslauer et al., 1986) and genome structure (Blake and Earley, 1986; Loomis and Gilpin, 1986; Ohta, 1987; Reanney, 1986; Sankoff and Goldstein, 1989), the evolution of splicing (Sharp, 1985), exons (Bulmer, 1987; Naora and Deacon, 1982), introns (Gilbert et al., 1986; Senapathy,1986), intron-encoded proteins (Perlman and Butow, 1989) and non-coding sequences (Naora et al., 1987), the origins of retroviruses (Doolittle et al., 1986), the salient features of substitution rates (Britten, 1986; Ochman and Wilson, 1987) and the effect of codon usage on gene expression (Grantham et al., 1981). Finally, the theory and practice of evolutionary tree computation came into maturity (Felsenstein, 1981, 1985, 1988b), culminated by the widely used program PHYLIP (Felsenstein, 1988a).
 
TENYEARS AGO,WITH HINDSIGHT

Here is a pretty realistic picture of a computational biologist working back in 1992. In terms of generic computing tools, there had been access to the InterNet, mostly through services like (bitnet) e-mail, gopher/ftp and the first web browser, Mosaic (http protocol), allowing access to a little more than 100 or so(!) web sites. Computer systems were quite heterogeneous, including VAX/VMS machines and Unix workstations (and another dozen of less widely known operating systems). In addition, in academic environments Apple Macintosh systems were abundant, thanks to their groundbreaking icon-based user interface and word-processing or desktop publishing capabilities. There has been distributed databases, such as GenBank and MedLine, but their availability was limited, mostly through CD-ROMs. CD drives were just being made available and the first version of X-windows was launched (graphical user interfaces were still in their infancy). About that time the first interpreted languages appeared, inspired by the Unix utility awk and quickly followed by perl and python.
 
In terms of scientific toolkits, BLAST was just made available (Altschul et al., 1990), including sequence masking procedures, such asXNU(Claverie and States, 1993). RasMol
(Sayle and Milner-White, 1995) and Kinemage (Richardson and Richardson, 1992) were making headlines in terms of protein structure visualization. The Genetics Computer Group (GCG) software was available on VMS and in wide use—along with many other popular sequence analysis packages for the Macintosh. The first sophisticated gene prediction programs were also appearing (Brunak et al., 1990; Fickett and Tung, 1992; Guigo et al., 1992; Mural and Uberbacher, 1991; States and Botstein, 1991). In protein structure prediction, the second-generation secondary structure prediction algorithms based on multiple sequence alignment (Rost and Sander, 1993), by then also widely available, indicated significant progress in the field. Excitement was in the air (Thornton et al., 1992) because of the first successful results in protein docking (Walls and Sternberg, 1992) and protein sequence threading (Bowie et al., 1991; Jones et al., 1992; Ouzounis et al., 1993) (problems still remaining unsolved today). High-throughput sequence similarity runs were being explored, with the clustering of the full protein sequence database (Gonnet et al., 1992).This activity denoted the beginning of the genome informatics era, celebrated by the computational re-annotation of the first ever entire chromosome sequence, yeast chromosome III (Bork et al., 1992). The rest, as they say, is history.
 
TODAY ANDTHE FUTURE

Given this short and rather subjective account on the development of bioinformatics, it is fair to ask what is the value of this kind of historical perspective. Two good reasons come to
mind: first, it is important to both appreciate and understand the first steps into the unknown taken by a number of pioneers to open up a field that would later become a discipline with far-reaching implications for biological sciences; second, through this discursive history, it is evident that this field has grown and become an independent discipline with solutions of biological problems but with its own problems, solutions and further directions. Bioinformatics has become an independent scientific discipline, as old as computer science itself. Despite
common perceptions, it is not ‘just’ a technology platform for genomics and systems biology, although its impact on those disciplines should not be underestimated. These datadriven
fields, however, provide novel types of data which result in new kinds of problems and expanded horizons both for genomics and bioinformatics, in a healthy and fascinating interplay. Despite the fact that the actual origin of the term ‘bioinformatics’ still eludes us, it is clear that this discipline will continue to evolve rapidly into the 21st century, perhaps to a point beyond recognition. Merging with nanotechnology, computing with biological matter is expected to transform our own lives, in particular, and life on earth, in general. One day we may look back and understand how computation and experimentation with biological systems blurred the divide and allowed the ‘great crossing’ between the inanimate and the animate worlds.
 
ACKNOWLEDGEMENTS

Sincere apologies for omitting many citations due to space limitations. Thanks to Antoine Danchin, Arthur Lesk, Chris Sander, Janet Thornton, Anna Tramontano and referees for comments.
 
REFERENCES(略)
生物频道录入:biorode    责任编辑:管理员 


评论】【收藏】【告诉好友】【打印】 【返回顶部】 【直达首页】 【网站地图】 【进入论坛】 

文章评论(评论内容只代表网友观点,与生物谷立场无关!

推荐信息
推荐产品
最新资讯
热点聚焦
推荐文章
 
 
关于我们 | 广告服务 | 联系方式 | 帮助信息 | 服务条款 | 法律声明 | 战略伙伴 | 友情链接 | 生意通 | 网站地图 | Bioon English
Copyright © 2001-2007 生物谷 bioon.com , All Rights Reserved. 版权所有
不良信息举报信箱:editor#bioon.com
网站备案:沪ICP备05022939号