来源
2003-10-16 0:19:00

用生物信息学分析信号转导

lected transcription factors control the amino acid transporters and how the cross-talks between the amino acid transport pathway and the other related pathways are achieved. Although the pathway model contains some local information known before, it is the first global pathway hypothesis for the amino acid transport regulation process. The constructed pathway model provides a good starting point for biologists to design new experiments. This work demonstrates the power of integrating analyses of high-throughput biological data with traditional biology in biological pathway inference and modeling in a systematic manner.

The algorithmic tools offered by the DSL include a variety of statistical, machine learning and data mining tools for: (a) classification, (b) variable selection and dimensionality reduction, (c) learning causal probabilistic network representations of gene and protein interactions, (d) learning local causal models (direct causes and effects) or predictive neighbourhoods (‘Markov Blankets’) around a ‘target’ gene or protein. These algorithms/tools are either standard off-the-shelf ones, or have been implemented in our lab from published reports from other groups, or are unique to DSL.
The novel DSL algorithms output either the Markov Blanket of some target variable T (i.e., minimal set of optimal predictors), denoted by ‘MB (T)’, or the direct causes and effects of T, denoted by ‘DCE(T). We outline here their main proven theoretical properties, or experimentally suggested practical properties of these algorithms:
Property 1: the algorithms are correct given well-defined and practical assumptions. These assumptions are: that data comes from causal network-faithful processes, that data is identically distributed and independently sampled (i.e., the generating process is time invariant and sampling is unbiased); that reliable statistical statistics exist for testing independence and testing associations. Some algorithms also require causal sufficiency while some others do not (by virtue of post-processing with the FCI algorithm).
Property 2: When the assumptions are violated the methods “degrade gracefully” still yielding useful results.
Property 3: The algorithms scale up to more than a hundred thousand variables with minimal computing equipment and even further with high-performance computing.
Property 4: The quality of the solution depends only on local properties; in contrast, global algorithms propagate errors to unrelated regions.
Property 5: The sample required is only a function of the local neighborhood. In global algorithms, a sample-intensive (e.g., densely-connected) region affects adversely all regions, even those that can be learned with little sample locally.
Property 6: Similarly, the computational efficiency is insensitive to the structure beyond the target T while in global algorithms; the difficult regions affect the computational complexity of all regions (since output is produced for all regions at once).
Property 7: The algorithms can be combined with existing global discovery algorithms (either in an interleaved sense or in as pre-processing step) to improve quality of discovery.
Property 8: The algorithms are directly parallelizable.

Specific Aims:
(1) Use multiple sources of information for constructing signal transduction pathways. This includes microarray gene expression data, genomic sequence data, protein-protein interaction data, lipidomics data, etc., to derive (a) which proteins are in a particular target pathway, and (b) how these proteins interact in the pathway; supply the derived model for mathematical modeling; suggest/design new experiments to validate proposed pathway models. It consists of following tasks:

Inferring a pathway of studied genome from a known pathway in a different genome. Given any known gene in the known pathway in the KEGG database or other pathway databases as described above, we can search homologs in the studied genome using sequence-comparison tools like BLAST/PSI-BLAST (Altschul et al, 1997). Then the homologous genes in the studied genome can be mapped on the known pathway of the different genome. Using novel DSL algorithms to derive an initial pathway model for those that cannot be derived in Task 1. We will apply the algorithms to infer the casual relationship between genes in a signal transduction pathway using gene expression data, either retrieved from public databases or generated from the Microarray Core of the project. The initial pathway model will give us a list of candidate genes that are most likely related to a biological process and a directed graph of Bayesian network between these candidate genes, which describes the impact of one gene on another with a confidence level.

Applying high-throughput data and bioinformatics predictions to refine the pathway model. The pathway models derived from either Task 1 or Task 2 may have missing components, unrelated genes, and incorrect connections between genes. We will refine the initial pathway model to be consistent with gene expression data and protein interaction data if available. Other predicted information, including protein structure/function and subcellular locations will also be used to validate the pathway model. Suggesting experiments to verify constructed pathways. This includes genetic and biochemical studies, such as mutation of key genes in the constructed pathway to see the phenotypes, and a specific yeast two-hybrid experiment for a particular pair of protein interaction.

(2) Study evolutionary relationships between signal transduction pathways. From evolutionary point of view, all the signal transduction pathways are related, from microbial to human. Understanding the evolutionary relationship between different pathways can shed some light on the mechanism of these pathways. For this purpose two tasks will be performed:

Carrying out computational comparative genomics studies for signal transduction pathways. Sequence comparison between the proteins in the signal transduction of a genome and genomic sequences of microbial genomes will be performed. The complete genomes of more than 50 microbial species have been sequenced. Due to the simplicity of microbial genomes, additional information can be found for the signal transduction pathways, e.g., the operon structure (Stephanopoulos et al., 1998), where several genes related to the same biological pathway are arranged together in the genomic sequence. In addition, comparing regulatory regions between orthologs in different genome can help us define the regulatory binding motifs.

Comparing the roles of proteins with similar structural fold in different signal transduction pathways. It is interesting to see some structural folds occur frequently in diverse signal transduction pathways, for example b-propeller structural fold. These proteins, whose sequences may not be similar, could be evolutionarily related. They provide some clues about the evolutionary relationship between seemingly unrelated signal transduction pathways. (3) Provide general bioinformatics support for the proposed Center. Though many bioinformatics tools and databases are available, experimental biologists often do not take full advantage of these bioinformatics resource due to lack of computational expertise. We will provide help them use the bioinformatics resources in the proposed research projects. Providing training or performing computation for experimental biologists. This includes general bioinformatics tools in sequence comparison, gene expression clustering, regulatory region analysis, protein structure prediction, protein subcellular localization prediction, molecular dynamics simulations, etc. Helping researchers use the computational resources at the DSL. This includes general mathematical tools, such as SPSS and Matlab, and our novel supervised and unsupervised algorithms for biological inference.

Model-Integrated Computing Contribution
We propose to integrate different computational components into a seamless computational environment and provide a user-friendly interface to experimental biologists. Modeling and simulation of signal transduction involves different computational methods and interactions between experimental biologists and theoretician. Biologists draw diagrams of molecular interactions. Mathematicians speak in terms of Bayesian network, PDE’s, implicit/explicit solvers, etc. A significant gap prevents fluid flow of information, and complicates the use of tools of the others. To close this gap, we will use Model-Integrated Computing (MIC) to develop domain-specific languages for the biologists to draw diagrams of complex intracellular systems in their terms. These languages will be evolved with feedback from the users. Given a model of the biologic

上页  [1] [2] [3] 下页

  • 众说风云 (已有0条评论)

聚焦

个人基因组测序将蓬勃发展

生物谷专访:全球首家个人基因组测序机构Knome公司总裁及CEO

Master

人物

成功的秘诀

Train to gain

招聘

为你的职业拓宽道路

分子生物学相关产品



定量PCR仪

Eppendorf Ep Master
定量PCR仪

实时定量PCR仪

ABI Stepone TM 实时定量PCR仪,最新的软件系统,界面友好,操作简单

PCR产物纯化

各种厂家和各种规格的PCR产物纯化试剂盒


定量PCR试剂

最全的定量PCR试剂


荧光定量PCR全套服务方案

从引物设计到实验全程服务