生物谷报道:细胞信号转导不是线性化的,也即是说细胞信号转导不是我们通常所认为的是某一条信号通道cascade下来,如MAPKKK--MAPKK--MAPK这种线性化,实际上,细胞信号通道间存在大量的crosstalk,甚至是一种网络,传统的研究信号通路的方法在此显得无能为力,这也可能是为什么一种细胞活动有许多种解释和机理一样,而采用生物信息学手段,从信号网络角度采用数学模型进行重建,从而简化研究,了解每一种蛋白质在信号网络中的权重,从而得出一种合乎信号本质的理论。
Introduction
The sequencing of genomes provides a framework for the investigation of cells as complex systems. Whereas the `parts list' of the cell provides the molecular basis for understanding the cellular function, the integration of the parts list with data from diverse scales of measurement and subsequent analysis will provide a systems perspective on cellular behavior. Many of the early analyses of the genome have been based on computational strategies that include gene finding based on sequence signals and comparative sequence analysis with previously annotated genes. These approaches are only partially useful for finding gene regulatory regions and fail to provide any insights into the regulation of transcription. At the next level of organization, the interactions between components in the cell form complex networks, and the determination of the network topology represents a formidable challenge. Intermediate measurements of various cellular components or their properties provide insights into possible cellular network architectures; however, general methods that can probe the spatiotemporal responses of cellular components to input are, as yet, unavailable. Although the topology of many cellular subnetworks has been determined through painstaking `trial and error' experimentation, there is, as yet, no systematic methodology for the incorporation of this information into computational models with predictive capabilities at the cellular level. The third level of organization involves the determination of the dynamical processes that occur within a given cellular network. At this level, the implications of the underlying logic of cellular and genetic networks are difficult to deduce through experimental techniques alone, and successful approaches will involve the union of novel experiments with computational tools. Finally, although the vocabulary of genomes, genes and proteins is arguably well-defined, a structured description of cellular processes and networks does not yet exist. The difficulty in creating such an infrastructure stems from the intrinsic complexity, the strong domain and context dependence of components of the network and the networks themselves, and from the linguistically rich descriptions of cellular pathways that defy structuring. Recent advances in describing the components of cellular signaling systems and their interrelationships are discussed. The development of well-curated databases of this data, which enable the reconstruction of biochemical pathways and their subsequent analysis, is presented.
The parts list problem
Cells respond to input by invoking a large number of molecular players. The spatiotemporal measurements of the components of the cell involved in the response can provide insights into the cellular signaling networks. The first step in building a parts list is the cataloging of all the genes and proteins that are expressed in a mammalian cell in a given tissue under a given condition. The parts list provides a broad road map from which one can construct in a context-dependent manner the involvement of genes and proteins in specific intracellular networks. Two major techniques have been employed to decipher the gene parts list. DNA microarrays have the capability to provide a parts list of genes that are expressed above a given threshold in cells [1.]. For instance, this method has been used to obtain a gene parts list for diauxic shift conditions in yeast [2.], and to identify genes implicated in yeast pheromone action as well as cellular response to specific molecular inputs in mammalian B cells (http://www.signaling-gateway.org/data/). Gene expression changes also provide molecular phenotypes to discriminate between normal and pathological states of a tissue (e.g. [3.]). Gene transcript changes are triggered by differential regulation of expression. The expression profile data provides an additional and important dimension. It provides insight into how information is encoded by the genome to direct tissue-specific and temporally specific gene expression during animal development and in response to environmental stimuli. Bioinformatics algorithms have only been marginally successful in predicting regulatory regions and binding sites for regulatory proteins. For most transcription factors many potential binding sites can be found within the genome, but only a small subset of these potential sites is actually occupied in vivo. The specificity of transcription factor binding in vivo not only depends on its biochemical properties, but can also be affected by the combinatorial and synergistic action of adjacent DNA-binding proteins and by local chromatin structures and modifications that control access to DNA. Furthermore, specificity in gene regulation is also generated by protein–protein interactions (catalyzed by clustered regulatory elements in the DNA) that determine whether transcription factor binding to the promoter results in gene activation. Young and coworkers [4.
] have invented an exciting technology involving chromatin immunoprecipitation (ChIP on a chip) to obtain a list of transcription factors and their regulators on a genome-wide scale. Together, these methods can provide the biochemical players involved in gene transcription regulatory networks and point to upstream pathways that induce specific transcription factors.
The challenge of organizing gene expression data along with the experimental conditions was tackled by the research community and this has resulted in the MIAME (minimum information about a microarray experiment) standards [5.
]. Abstractly, gene expression data can be viewed as a matrix of rows and columns, rows representing the genes and columns representing the specific conditions of the array measurement. For gene expression data to be useful in a biological context, the large matrix needs to be analyzed and visualized. Several strategies have been developed for the analysis of gene expression data [6.], although biological model-driven methods have provided the most interesting results.
Obtaining the proteome parts list of a cell is a considerably more daunting task. Besides being present in low concentrations, proteins are constantly undergoing changes in their state by forming complexes with other proteins, undergoing covalent modifications and binding to myriad substrates in a cell. Deciphering spatiotemporal variations of the proteins in a dynamic cell is beyond the scope of current technology and most current efforts focus on analyzing a cellular milieu for specific proteins using immunoprecipitation and tagging methods [7.
and 8.
]. Cellular networks involve interactions of myriad proteins in a given context and fluorescence and microscopy are the best methods currently available to study these interactions [9.]. The yeast two-hybrid method is considered the best available strategy for mapping protein–protein interactions on a large scale [10.], although there is a potential for false positive identifications. Further, this method can only provide the potential list of interactions and not the specific interactions involved in a given pathway. A large body of knowledge on protein interactions in networks, therefore, comes from detailed biochemical analysis.
The SWISSPROT database [11.] and GenPept (from GenBank) provide annotations of proteins vis-à-vis their function; however, these data repositories provide no information on the states and functions of proteins in a context-specific manner. The Molecule Pages Database from the Alliance for Cellular Signaling provides the first comprehensive expert-curated annotation database for signaling proteins and contains exhaustive information for mapping intracellular networks [12.]. This database contains, in addition to standard annotation, the list of all known functional states of the protein, transitions between protein states and any functional and quantitative data pertaining to the molecule in the given state.
Reconstruction of biochemical pathways
Reconstruction of biochemical pathways is a complex task. In metabolism, databases like KEGG [13.] and EcoCyc [14.] serve as valuable resources for metabolic networks. Such extensive and well-curated databases are not yet available for cellular signaling. The role of each protein in a signaling network is to communicate the signal from one node to the next, and to accomplish this the protein has to be in a defined signaling `state'. The state of a signaling molecule is characterized by covalent modifications of the native polypeptide, the substrates and/or ligands bound to the protein, its state of association with other protein partners, and its location in the cell. A signaling molecule may be a receptor, a channel, an enzyme or several other functionally defined species, depending on its state. In the process of passing a signal, a molecule may undergo a transition from one functional state to another. Interactions within and between functional states of molecules, as well as transitions between functional states, provide the building blocks for the reconstruction of a signaling network.
The process of construction of signaling pathway models requires the assembly of a network of interacting proteins in a given context of the cell. Much of our knowledge of the pathways comes from interrogation of cells by specific perturbations followed by assays and systematic biochemical analysis of protein complexes. Reconciliation of a large body of cellular data provides validation strategies for reconstructed networks [15.
]. Even though the standard representations of biochemical pathways have been incomplete, they serve as useful models for constructing and testing specific biological hypotheses. Several efforts are underway currently to build databases of biochemical signaling pathways and networks of pathways [16.]. These databases are also combined into larger infrastructures containing graphical user interfaces and some rudimentary analysis tools. An example of an infrastructure model for a signaling database and analysis system is presented in Figure 1.
Figure 1. An example of an infrastructure model for a signaling database and analysis system.
Modeling biochemical networks
Quantitative mapping of input-response behavior in mammalian cells warrants the development of entirely novel computational methods and strategies. The biochemical model that is in the form of a network graph has to be mathematically modeled in terms of biochemical reactions, which in turn will be numerically solved using computational methods. Ultimately, the goal is to provide a quantitative measure of how cells map input to response. Unlike equation-driven biophysical approaches, the novel strategies for modeling will have to combine a large body of data, biological constraints in the form of rules, and numerical computations.
Arguably the earliest model of cellular behavior was the modeling of the membrane current-voltage relationships in a nerve using Kirchoff's equations by Hodgkin and Huxley [17.]. Their model was remarkable given the complete ignorance of cellular components and intracellular networks at the time. Several biochemical reaction network-based approaches were developed following chemical reaction kinetics modeling. These led to explanations of focused biochemical modules, such as the modeling of the lac operon [18.] and oxygen binding to hemoglobin [19.].
Several whole-cell modeling methods have been developed in the past decade, including constraint-based modeling approaches [20.], kinetic and stochastic equation model analyses of biochemical circuits [21. and 22.], topological model analyses of networks [23.], M-Cell (http://www.mcell.cnl.salk.edu), and Virtual Cell [24.]. Although very innovative, these approaches have suffered from a paucity of consistently measured high-quality data that can be used to generate the parameters for mathematical modeling. However, some initial results have already demonstrated the utility of such modeling methods. For example, Palsson and coworkers have used a constraint-based model to predict endpoints of adaptive evolution in a strain of Escherichia coli [25.] and have predicted the effects of gene deletions and changes in metabolite inputs [26.]. In addition, a detailed model of the bacteriophage
lysis–lysogeny decision circuit developed by McAdams and Shapiro [27.] demonstrated the mathematical logic underlying biochemical processes.
Conclusions
The transition from genomic and proteomic `parts lists' to fully reconstructed biochemical network models is of critical importance in understanding how cells respond to the milieu of environmental stimuli and developmental cues. Concerted research efforts including large-scale expression profiling of cells under varied conditions and the subsequent development of microarray data standards have generated a wealth of data crucial to the fulfillment of the goal to understand cellular network behavior. Bioinformatic databases have begun to serve as invaluable repositories for this data, attempting to maintain cellular context for the information. Although still in their infancy, biochemical network models have proven to be useful for integrating this information to generate holistic understanding of cellular behavior. Bioinformatics research will continue to bridge the gap between molecular biology and network understanding, facilitating the reconstruction of biochemical pathways and leading to the analysis of function of cellular signaling.
Original article:
Bioinformatics and cellular signaling
Jason Papin and Shankar Subramaniam
The understanding of cellular function requires an integrated analysis of context-specific, spatiotemporal data from diverse sources. Recent advances in describing the genomic and proteomic 'parts list' of the cell and...
Current Opinion in Biotechnology, 2004, 15:1:78-81
Related article:
N.D. Price, J.A. Papin, C.H. Schilling and B.O. Palsson, Genome-scale microbial in silico models: the constraints-based approach. Trends Biotechnol 21 (2003), pp. 162–169. Full Text + Links | PDF (597 K)
M.W. Covert and B.O. Palsson, Transcriptional regulation in constraints-based metabolic models of Escherichia coli. J Biol Chem 277 (2002), pp. 28058–28064. Full Text
M.J. Herrgard, M.W. Covert and B.O. Palsson, Reconciling gene expression data with known genome-scale regulatory network structures. Genome Res 13 (2003), pp. 2423–2434. Full Text
K. Truong and M. Ikura, The use of FRET imaging microscopy to detect protein-protein interactions and protein conformational changes in vivo. Curr Opin Struct Biol 11 (2001), pp. 573–578. Full Text + Links | PDF (508 K)


