生物谷报道:最简单的生物,如低等的细菌,也会有几百个甚至几千个基因,这些基因到底如何工作和活动的?如何构成细菌生命网络,这点在过去一直不清楚,也缺乏有效的研究手段,新的分析策略为揭示这一机理提供了有益的办法,最新的Nature报道了这一成果,这一成果也将为人类揭示高等生物的生命活动本质提供了基础。
Integrating high-throughput and computational data elucidates bacterial networks
We first validated the model, or 'in silico strain' of E. coli (iMC1010v1; see ref. 4 for conventions for naming in silico strains), against a data set of 13,750 growth phenotypes5 obtained from the ASAP database6, and then used this genome-scale model to select transcription factors for prospective gene knockout studies. Comparison with the growth phenotypes showed that experimental and computational outcomes agreed in 10,828 (78.7%) of the cases examined, which is roughly the same success rate achieved in previous studies in E. coli and yeast that considered only a few hundred phenotypes7-9. In addition, 2,512 (18.3%) of the cases were predicted correctly only when regulatory effects were incorporated into the metabolic model (see Supplementary Information for details).
The comparisons in this study identified several substrates and knockout strains whose growth behaviour did not match predictions (Fig. 1). Further investigation of these conditions and strains led to the identification of five environmental conditions in which dominant, as yet uncharacterized, regulatory interactions actively contribute to the observed growth phenotype, and five environmental conditions and eight knockout strains that highlight uncharacterized enzymes or non-canonical pathways that are predicted to be used by the organism (Fig. 1; a detailed analysis of the discrepancies is provided in the Supplementary Information).
![]() |
Figure 1 Growth phenotype study. Full legend High resolution image and legend (179k) |
We wanted to determine the utility of this model-driven approach in elucidating transcriptional regulatory networks. A previous study, which evaluated the consistency between existing gene expression data sets and the known transcriptional regulatory network of E. coli, identified the response to oxygen deprivation as a partially consistent module10, 11. We therefore targeted this part of the transcriptional regulatory network for further network characterization. Six strains with knockouts of key transcriptional regulators in the oxygen response (
arcA,
appY,
fnr,
oxyR,
soxS and the double knockout
arcA
fnr) were constructed. The messenger RNA expression profiles of these strains, as well as the wild-type strain, were measured in aerobic and anaerobic glucose minimal medium conditions. The data were analysed12 in the context of iMC1010v1 predictions to identify new interactions in the regulatory network (Fig. 2).
![]() |
Figure 2 Characterization of the regulatory network related to the aerobic–anaerobic shift. Full legend High resolution image and legend (212k) |
Expression profiling of the wild-type strain identified 437 genes that experienced a significant change in transcription in response to oxygen deprivation (t-test, multiple testing corrected to give a false discovery rate (FDR) of less than 5%); of these, 151 genes were included in iMC1010v1. Computationally, 75 genes were predicted by iMC1010v1 to show differential expression in response to oxygen deprivation. These 75 genes could be divided into three categories: 23 agreed with measured expression changes; 24 had a predicted expression change that was either not found to be statistically significant in the experimental data (23/24) or in a direction opposite to that of the experimental data (1/24); and for 28 genes there were no expression data available (transcript abundance was determined to be 'absent' for two or more of the replicates). Thus, of the 47 (= 23 + 24) differentially expressed genes that could be compared between the model computation and experiment, 23 (or 49% accuracy) agreed. Considering the overall number of genes in the model for which there were experimental data, the overlap (23) between the sets of predicted (47) and experimentally detected (151) differentially expressed genes is significant in comparison to a model that would randomly predict expression changes (P < 0.005 on the basis of a cumulative binomial distribution). There were 151 genes that were differentially expressed and included in the model; however, with only 23 (or 15% coverage) correctly computed, there is much room for expanding the transcriptional regulatory network in iMC1010v1 on the basis of the experimental data (Fig. 3).
![]() |
Figure 3 Biological network elucidation by a model-centric approach. Full legend High resolution image and legend (57k) |
To understand which transcription factors are involved in regulating these differentially expressed genes after oxygen deprivation, we compared the gene expression data for the wild-type and each knockout strain separately. Using two-way analysis of variance (ANOVA), we could determine whether the differential expression was significantly altered in the knockout strain as compared with the wild type. A large portion of the expression changes observed for the wild-type strain were not significantly affected in any of the knockout strains (195/437 or 44.6% of genes overall, 63/151 or 41.7% of genes in the model, FDR < 5%), suggesting that none of the five transcription factors studied here regulates the expression of these genes or that combinatorial interactions between multiple transcription factors are involved in regulation. For the remainder of the genes, differential expression was abolished in one or more of the knockout strains (Fig. 2c).
The ANOVA-based identification of transcription factors that influence differential expression of specific genes enabled us systematically to rewrite, relax or remove various regulatory rules in the model to resolve the discrepancies between iMC1010v1 and the experimentally determined wild-type differential gene expression. For many (81) of the genes, a regulatory rule already existed and had to be reconciled with our new data to accommodate the newly determined transcription factor dependencies. For genes where none of the knockouts abolished differential expression, we simply based a new regulatory rule on the presence of oxygen rather than a transcription factor (39 genes). By contrast, for genes where a change in expression was predicted but not observed, we removed oxygen dependency from the existing regulatory rule (23 genes). In addition, for 12 genes the predicted expression changes agreed with the observed expression in the wild type, but our knockout perturbation analysis indicated that the transcription factors involved in the regulation differed from previously reported data and the model needed to be changed (all new regulatory rules are detailed in the Supplementary Information).
The updated model (iMC1010v2) was used to recalculate all of the predictions for both the aerobic and anaerobic expression data and the high-throughput phenotyping arrays. Note that iMC1010v2 accounts for the same genes as iMC1010v1 but has different regulatory interactions among the gene products and oxygen as an environmental variable. We found agreement between model predictions and the gene expression data to be substantially higher using the iMC1010v2 model, as expected (Fig. 2c). Specifically, 100 of the 151 expression changes were correctly computed with iMC1010v2, and the number of false-positive predictions (Fig. 2, yellow boxes) was reduced to zero. In resolving many of the cases of unpredicted differential expression (Fig. 2, orange boxes), we found that implementation of the ANOVA-derived rule resulted in the inability of the wild-type or knockout in silico strain to grow aerobically or anaerobically on glucose, or under other conditions where growth had been previously established (for example, wild-type and knockout strain average growth rate under aerobic conditions, 0.68
0.04 per hour; anaerobic, 0.43
0.07 per hour). Such cases may be thought of as an 'overfit' of the microarray data. Accordingly, we relaxed the regulatory rule for these genes (42 in total) to allow for a correct phenotype prediction. Comparisons for the high-throughput phenotyping data revealed very little difference from Fig. 1 (only 11 out of 13,750 cases were affected; see Supplementary Information).
The iterative modification of the regulatory rules led to three main observations. First, some of the results of the knockout perturbation analysis are complex enough to make boolean rule formulation difficult. For example, the interplay of Fnr and ArcA can lead to complex behaviours where the expression change observed in wild type is abolished in the
arcA or the
fnr strains, but not in the
arcA
fnr strain. Such complex interplay among transcription factors can lead to specialized expression changes, as observed in the cydAB response to anaerobic, microaerobic and aerobic conditions13, 14.
Second, in revising regulatory rules for transcription factors we found that whereas in some cases, such as arcA, expression of a regulatory protein correlates positively with its activity, in several cases, including fnr, betI and fur among others, the mRNA level of a regulatory gene is reduced when the protein is in fact activated. For example, under anaerobic conditions when Fnr is known to be active11, its expression is significantly reduced. Such behaviour, underscored by similar observations of mRNA transcript levels and corresponding protein product abundance in yeast15, suggests that the identification of regulatory networks, and transcription factors involved in regulation in particular, will not be accomplished by the determination of co-regulated gene sets alone.
Third, many of these gene expression changes involve complex interactions and indirect effects. Transcription factors may be affected, for example, by the presence of fermentation by-products or the build up of internal metabolites. Such effects would be extremely difficult to identify or account for without a computational model.
In summary, we find that the reconciliation of high-throughput data sets with genome-scale computational model predictions enables systematic and effective identification of new components and interactions in microbial biological networks. Our study illustrates only the first round of an iterative model building strategy where an initial model based on literature-derived information (iMC1010v1) is used to design informative experiments and then updated on the basis of the new experimental data obtained (iMC1010v2). Another round of perturbation experiments will lead to iMC1010v3, and so on. We expect that after an effort of some years and many iterations of this process, regulatory network elucidation for E. coli will be essentially complete.
Methods
Computational model We constructed the model of the E. coli metabolic and regulatory network by identifying network components, their functions and interactions from the primary literature4, 9, 16. Many approaches have been developed to analyse large-scale metabolic17-22 and transcriptional regulatory23-25 networks. Growth and gene expression simulations were done by regulated flux-balance analysis, which combines linear optimization to determine a growth-optimized metabolic flux distribution with logic statements to simulate the effects of regulatory processes over time. The whole model construction and simulation process has been described elsewhere in detail26.
Strains and culture The parent strain for knockout strains in this study was K-12 MG1655 (ref. 27), and all deletion strains were generated as described28. Growth experiments for the gene expression study were done on M9 glucose medium (2 g l-1) under aerobic and anaerobic conditions, as described17. The growth data contained in the ASAP database were obtained by using high-throughput phenotype arrays (Biolog)5. In some cases (where the viability of a particular environment was unclear from the phenotype array data), we cross-validated the ASAP phenotyping data by culturing the wild-type strain under the given conditions in our laboratory (see Supplementary Information).
Gene expression profiling and analysis All gene expression measurements were done at least in triplicate. Samples were stabilized by using RNAProtect bacterial reagent (Qiagen), and total RNA was isolated from exponentially growing cells using a RNeasy mini kit (Qiagen) in accordance with the manufacturer's protocols (see http://www1.qiagen.com). The RNA (10 µg) was then used as the template for complementary DNA synthesis, the product of which was fragmented, labelled and hybridized to an E. coli Antisense Genome Array (Affymetrix), which was washed and scanned to obtain an image in accordance with the manufacturer's protocols (see http://www.affymetrix.com). The image files were processed and expression values were normalized using dChip software29. We used quantitative real-time polymerase chain reaction with reverse transcription (RT–PCR) to validate expression changes for selected genes. The statistical significance of expression changes for each gene and each strain between aerobic and anaerobic conditions was determined by a t-test (log-transformed data, equal variance).
For each deletion strain, we used a two-way ANOVA (strain as the first factor and aerobic or anaerobic condition as the second factor) to determine whether the differential expression observed in the wild-type strain was significantly altered in the deletion strain by determining the statistical significance of the strain–condition interaction effect. For both the t-test and the ANOVA analysis, correction for multiple testing was done by using the Benjamini–Hochberg false discovery rate procedure30, which determines the P-value cut-off for each test separately by estimating the FDR resulting from using a particular P-value cut-off. The false discovery rate refers to the fraction of true null tests out of all the tests called significant and an FDR of 5% was used for all tests. All gene expression data and the relevant information (such as the MIAME checklist) are provided in the Supplementary Information.
Supplementary information accompanies this paper.
原文出处:
Integrating high-throughput and computational data elucidates bacterial networks 92
MARKUS W. COVERT, ERIC M. KNIGHT, JENNIFER L. REED, MARKUS J. HERRGARD & BERNHARD O. PALSSON
doi:10.1038/nature02456
Nature 429, 92 - 96 (06 May 2004);





