Proteome mining is a functional proteomics approach used to extract protein information from the analysis of specific subproteomes. The strategy of proteome mining is shown in Fig. 15. The principles of proteome mining are based on the assumption that all drug-like molecules selectively compete with a natural cellular ligand for a binding site on a protein target. In a proteome mine, natural ligands are immobilized on beads at high density and in an orientation that sterically favors interaction with their protein targets. The immobilized ligand is then exposed to whole-animal or tissue extract, and bound proteins are evaluated for specificity by protein sequencing. In the prototypic example from our laboratory, ATP is immobilized in the "protein kinase orientation" (via its gamma phosphate). Microsequencing of the proteins that were eluted with free ATP demonstrated that the nucleotide selectively recovered purine binding proteins including protein kinases, dehydrogenases, various purine-dependent metabolic enzymes, DNA ligases, heat shock proteins, and a variety of miscellaneous ATP-utilizing enzymes (P. R. Graves, J. Kwiek, P. Fadden, R. Ray, K. Hardeman, and T. A. J. Haystead, submitted for publication). This immobilized proteome represents
4% of the expressed eukaryotic genome.
|
We have utilized this captured proteome (the purine binding cassette proteome) to test the selectivity of purine analogs that inhibit protein kinases and stress-induced ATPases in vitro. Using a proteome-mining ATP affinity array apparatus constructed in our laboratory, sufficient biomass was applied to ensure the recovery, per column, of 1 fmol of any protein expressed at 100 copies/cell (107 cells). After washing, each column in the array is eluted in parallel with molecules from a purine-based iterative library and fractions are collected. Eluates are screened for protein, and positive fractions generally contain a single protein, a small number of structurally related proteins, or a complex mixture. Only the first two categories are sequenced, since the third resulted from elution with a nonselective inhibitor. Once one has identified an eluted protein, one has all the necessary information on how to proceed. The first decision is biological relevance. Does the eluted protein(s) in any given fraction have relevance to any human disease? If the protein has no obvious use as a drug target, it is ignored. If the protein is deemed relevant, one immediately has a lead molecule and a defined target. In cases where a single protein is eluted, the lead is likely to be selective because it had an equal opportunity to interact with the rest of the captured proteome (
4% of the genome). Selectivity can be tested by increasing the concentration of the lead compound during elution from nanomolar to micromolar. Information concerning potential toxicity can be gained by sequencing other proteins that are simultaneously eluted or eluted at higher concentrations. If some of these are undesirable targets, iterative substitutions can be made around the lead scaffold to improve selectivity. Proof of principle of this technology was obtained by using an iterative library derived from the heat shock protein 90 inhibitor geldanamycin, and a new physiological target, ADE2, was identified (P. Fadden, V. J. Davisson, L. Neckers, and T. A. J. Haystead, unpublished data). Screening Combichem libraries through a proteome-mining approach exploits the serendipitous nature of drug discovery to its maximum, merely because it accelerates the hit rate over a conventional screen by a factorial of the proteome that is bound. In the case of purine binding proteins, this may be several hundredfold. Protein microsequencing, the data contained within the various genome projects, and the ability to instantly search the literature for relevance enable one to interpret the outcomes in a rationale way.
We are currently using proteome mining to discover new antimalarial drugs that target purine binding proteins in the blood stage of infection. Because of the essential roles of purine-utilizing enzymes in cellular function, it is our hypothesis that these proteins are attractive candidates for a new generation of antimalarial drugs. In our malaria project, the P. falciparum (blood stage) and human red blood cell purine binding proteome are captured on ATP affinity arrays and simultaneously screened against purine-based combinatorial libraries. Combining both proteomes enables the selectivity and potential toxicity of a lead molecule to be measured early in the discovery process. Microsequencing enables human proteins to be readily discriminated from malarial ones. An additional benefit of mining the entire malarial purine binding cassette proteome is that multiple leads and their targets will be identified. Combined therapies that target multiple genes simultaneously are likely to exert such tremendous selective pressure on the targeted pathogen that it cannot develop resistance. We are currently expanding our immobilized natural-ligand library in order to apply proteome mining to other areas of biology.
The study of proteins, in contrast to that of DNA, presents a number of unique challenges. For example, there is no equivalent of PCR for proteins, so the analysis of low-abundance proteins remains a major challenge. In addition, in protein interaction studies, native conformations of proteins must be maintained to obtain meaningful results. Can proteins be studied on a large scale with speed, sensitivity, and reliability? In the last several years, recognition of the limitations of proteomics are beginning to point the field in new directions.
Although the technology for the analysis of proteins is rapidly progressing, it is still not feasible to study proteins on a scale equivalent to that of the nucleic acids. Most of proteomics relies on methods, such as protein purification or PAGE, that are not high-throughput methods. Even performing MS can require considerable time in either data acquisition or analysis. Although hundreds of proteins can be analyzed quickly and in an automated fashion by a MALDI-TOF mass spectrometer, the quality of data is sacrificed and many proteins cannot be identified. Much higher quality data can be obtained for protein identification by MS/MS, but this method requires considerable time in data interpretation. In our opinion, new computer algorithms are needed to allow more accurate interpretation of mass spectra without operator intervention. In addition, to access unannotated DNA databases across species, these algorithms should be error tolerant to allow for sequencing errors, polymorphisms, and conservative substitutions. New technologies will have to emerge before protein analysis on a large-scale (such as mapping the human proteome) becomes a reality.
Another major challenge for proteomics is the study of low-abundance proteins. In some eukaryotic cells, the amounts of the most abundant proteins can be 106-fold greater than those of the low-abundance proteins. Many important classes of proteins (that may be important drug targets) such as transcription factors, protein kinases, and regulatory proteins are low-copy proteins. These low-copy proteins will not be observed in the analysis of crude cell lysates without some purification. Therefore, new methods must be devised for subproteome isolation. Despite these limitations, proteomics, when combined with other complementary technologies such as molecular biology, has enormous potential to provide new insight into biology. The ability to study complex biological systems in their entirety will ultimately provide answers that cannot be obtained from the study of individual proteins or groups of proteins.
生物谷网站 http://www.bioon.com
