
通过识别血清蛋白质组模式来鉴定卵巢癌
Use of proteomic patterns in serum to identify ovarian cancer
Emanuel F Petricoin III, Ali M Ardekani, Ben A Hitt, Peter J Levine, Vincent A Fusaro, Seth M Steinberg, Gordon B Mills,
Charles Simone, David A Fishman, Elise C Kohn, Lance A Liotta
Summary
Background:New technologies for the detection ofearlystage ovarian cancer are urgently needed. Pathological changes within an organ might be reflected in proteomic patterns in serum. We developed a bioinformatics tool and used it to identify proteomic patterns in serum that distinguish neoplastic from non-neoplastic disease within the ovary.
Methods:Proteomic spectra were generated by mass spectroscopy (surface-enhanced laser desorption and ionisation). A preliminary “training” set of spectra derived from analysis of serum from 50 unaffected women and 50 patients with ovarian cancer were analysed by an iterative searching algorithm that identified a proteomic pattern that completely discriminated cancer from noncancer. The discovered pattern was then used to classify an independent set of 116 masked serum samples: 50 from women with ovarian cancer, and 66 from unaffected women or those with non-malignant disorders.
Findings:The algorithm identified a cluster pattern that, in the training set, completely segregated cancer from noncancer. The discriminatory pattern correctly identified all 50 ovarian cancer cases in the masked set, including all 18 stage I cases. Of the 66 cases of non-malignant disease, 63 were recognised as not cancer. This result yielded a sensitivity of 100% (95% CI 93–100), specificity of 95% (87–99), and positive predictive value of 94% (84–99).
Interpretation:These findings justify a prospective population-based assessment of proteomic pattern technology as a screening tool for all stages of ovarian cancer in high-risk and general populations.
Introduction
Application of new technologies for detection of ovarian cancer could have an important effect on public health,1 but to achieve this goal, specific and sensitive molecular markers are essential.1–5 This need is especially urgent in women who have a high risk of ovarian cancer due to family or personal history of cancer, and for women with a genetic predisposition to cancer due to abnormalities in predisposition genes such as BRCA1 and BRCA2. There are no effective screening options for this population.
Ovarian cancer presents at a late clinical stage in more than 80% of patients,1 and is associated with a 5-year survival of 35% in this population. By contrast, the 5-year survival for patients with stage I ovarian cancer exceeds 90%, and most patients are cured of their
disease by surgery alone.1–6 Therefore, increasing the number of women diagnosed with stage I disease should have a direct effect on the mortality and economics of this cancer without the need to change surgical or chemotherapeutic approaches.
Cancer antigen 125 (CA125) is the most widely used biomarker for ovarian cancer.1–6 Although
concentrations of CA125 are abnormal in about 80% of patients with advanced-stage disease, they are increased in only 50–60% of patients with stage I ovarian cancer.1–6 CA125 has a positive predictive value of less than 10% as a single marker, but the addition of ultrasound screening to CA125 measurement has improved the positive predictive value to about 20%.6 Low-molecular-weight serum protein profiling might reflect the pathological state of organs and aid in the early detection of cancer. Matrix-assisted laser desorption and ionisation time-of-flight (MALDI-TOF)and surface-enhanced laser desorption and ionisation time-of-flight (SELDI-TOF) mass spectroscopy can profile proteins in this range.6–9 These profiles can contain thousands of data points, necessitating sophisticated analytical tools. Bioinformatics has been used to study physiological outcomes and cluster gene microarrays,10–13 but to uncover changes in complex mass spectrum patterns of serum proteins, higher order analysis is required. We aimed to link SELDI-TOF spectral analysis with a high-order analytical approach using samples from women with a known diagnosis to define an optimum discriminatory PROTEOMIC PATTERN. We then aimed to use this pattern to predict the identity of masked samples from unaffected women, women with early-stage and late-stage ovarian cancer, and women with benign disorders.
Participants and methods
Study population
100 control samples (50 for the preliminary analysis and 50 for the masked analysis) were provided from the National Ovarian Cancer Early Detection Program (NOCEDP) clinic at Northwestern University Hospital (Chicago, IL, USA). 17 other control samples from anonymous women unaffected by cancer were provided by the Simone Protective Cancer Institute (Lawrenceville, NJ, USA). These 17 women had endometriosis (seven), uterine fibroids (three), sinusitis (four), rheumatoid arthritis (two), and ulcerative colitis (one) and were included in the masked validation set. Cases from the NOCEDP were self-referred under at least one of the following eligibility criteria: at least one affected first-degree relative; familial breast or
ovarian cancer syndrome; positivity for BRCA1 or BRCA2 mutations; or personal history of breast cancer. BRCA1/2 status was not made available to this analysis under the conditions of anonymisation. The high-risk population was chosen because availability of a viable management option is particularly important for women who are at increased risk of development of ovarian
cancer.

All women received a yearly three-dimensional colour doppler flow ultrasound examination and
measurement of CA125 concentration.6 Cases were defined as unaffected if they had had a minimum
of 5 yearly follow-up examinations without diagnosis of ovarian cancer. Cases with ovarian cancer were eligible if they had had a serum sample banked before pathological staging by a gynaecological oncologist.Simple ovarian cysts were detected by ultrasonography in 38% of the unaffected women (table 1). All major epithelial subtypes of ovarian cancer were represented, and six of the cancer samples werefrom women with stage I disease, which mirrors the
distribution of stage I ovarian cancer in the community.Reported oral contraceptive use and parity was not different between the groups. The median age in the healthy symptom-free control population was 49 years (range 21–75) in the preliminary set and 48 years (25–73) in the masked validation set. These ages were not substantially different from those for the cancer patients in the preliminary set (median 58 years [range 29–82]) and in the masked validation set (59 [30–80]), including only those with stage I cancers (57 [35–75]). On the basis of the age distribution, premenopausal and postmenopausal women were equally represented in both groups, thus menopausal status should not have been a discriminator in the detection algorithm.

Serum samples were obtained before examination,diagnosis, or treatment and were immediately frozen in liquid nitrogen. Once at the laboratory, samples were thawed, separated into 10 L portions, and refrozen. The pathological diagnosis was concealed from the operators before all analyses. Samples were obtained via protocols approved by the Institutional Review Board and reviewed by the National Institutes of Health Office of Human Subjects Research.
Proteomic analysis
Serum samples were thawed, added to a C16 hydrophobic interaction protein chip, and analysed on
the Protein Biology System 2 SELDI-TOF mass spectrometer (Ciphergen Biosystems, Freemont, CA,
USA).9 Mass resolution (defined as m/ m) is routinely achieved below 400. Mass accuracy is assessed daily through the use of angiotensin peptide calibrations. We achieve a mass accuracy of 0·1% with this system. Peptides and proteins below the 20 000 mass/charge (M/Z) range were ionised with -cyano-4-hydroxycinnamic acid as a matrix, which is most effective for the detection of proteins and peptides in this mass range. The chips were analysed manually under the following settings: laser intensity 240, detector sensitivity 10, mass focus 6000, position 50, molecular mass range 0–20 000 Da, and a 50-shot average per sample. Data were collected without filters and were later used for analyses. Positives and controls were run concurrently,
intermingled on the same chip and on multiple chips; the operators were unaware of which was which. None of the samples in the preliminary set were subsequently used in the masked validation set.


Analytical procedure
We developed an analytical tool that combines elements from GENETIC ALGORITHMS first described by Holland14 and CLUSTER ANALYSIS methods from Kohonen.15,16 Genetic algorithms function in a manner similar to natural selection. The input data for analysis are ASCII files of proteomic spectra generated by SELDI-TOF. Each spectrum is composed of 15 200 M/Z values on the x axis, with a corresponding amplitude on the y axis. The output of the algorithm is the most fit subset of amplitudes at defined M/Z values that best segregates the preliminary data. Analysis was divided into two phases: a preliminary phase with knowns, and a testing phase with masked serum samples. In phase I (figure 1), mass spectra from the two preliminary sets—ie, the 50 patients with biopsy-proven cancer and the 50 unaffected patients and controls— were compared. The algorithm identified a small subset of key values along the spectrum x axis using an iterative searching process. This subset was judged as important because the pattern of amplitudes at these M/Z values completely segregated the serum from patients with ovarian cancer from the unaffected populations.
was outside the defined likeness boundaries of the cancer and unaffected clusters. The bioinformatics software developed and described herein is Proteome Quest beta version 1.0 (Correlogic Systems Inc, Bethesda, MD, USA). A detailed description of the methods, presentation of the raw spectra (n=216), and analytical results can be found at http://clinicalproteomics.steem.com (accessed Jan 23, 2002).
50 cases had 96% power at the =0·05 level to reject an 80% sensitivity or specificity in favour of a true value of 95%, using an exact test for single proportions, with cut-off points for rejection based on the cumulative binomial distribution. The Cochran-Armitage trend test17,18 and the Jonckheere test for trend19 were used to test the significance of the classification of cancer versus new cluster versus unaffected, according to whether the truedisease state was presented in two levels or multipleordered categories. All p values were two-tailed.


Reproducibility and precision
An example of nine independently obtained spectra from the between-run analysis of the serum from the unaffected woman used to determine reproducibility of the mass spectra is shown in figure 2. The coefficient of variance (CV) for eight selected M/Z peaks with the highest amplitude was less than 10%. There was little variation with day-to-day sampling and instrumentation or chip variations. We calculated that mass spectrum patterns remained consistent (CV <10%) if serum samples were not frozen and thawed more than twice, and once thawed, kept at 4oC for less than 24 h. The ability of the higher order bioinformatics tool to
classify the same spectral data reliably was tested by importing independently generated serum spectra from two individuals: one unaffected and one with stage III ovarian cancer. The algorithm reliably identified 100% of the profiles in the course of 100 independent applications to the C16 chip surfaces (data not shown). Detection of ovarian cancer Examples of SELDI-TOF spectra from four patients in the preliminary set (two healthy and two with cancer) are shown in figure 3. The optimum discriminatory pattern in N-space for ovarian cancer was defined by the amplitudes at the key M/Z values 534, 989, 2111, 2251, and 2465.
Complex serum proteomic patterns might reflect the underlying pathological state of an organ such as the ovary. This hypothesis is supported by the results of our masked analysis (table 2). Non-cancer control samples representing benign disease, gynaecological disorders, and inflammatory conditions were derived from patients in a high-risk clinic and from the general population (table 1). 63 of 66 samples were accurately classified as non-cancer, including all those from the general population. All ovarian cancers were correctly classified and distinguished from all non-malignant disorders, as were all stage I cancers confined to the ovary. The cancer sets were derived from a population potentially enriched for ovarian cancer. The high-risk population was chosen as a control set because: (1) early diagnosis is a viable management option for women who are at an increased risk of development of ovarian cancer; (2) this is the population for whom a screening programme would first be used; and (3) serum samples, ultrasonography, and clinical follow-up information could be obtained for 5 years. This population allowed us to test the specificity of our method for classifying benign symptomless disorders such as ovarian cysts—a source of potential false positives—and to verify the 5-year disease-free status.
proteins or peptides is under investigation. They exist in the low-molecular-weight serum proteome, which is largely unknown at present. These proteins or peptides could be derived from the host organ, the cancer, or constitute metabolic fragments. The proteins or peptides are hydrophobic and of low molecular mass because of the specific ionisation and chip surface conditions used.
measured over time. Similar approaches might improve the positive predictive value of proteomic analysis.
E Petricoin and L Liotta conceived the study, participated in modelling procedures and analysis, and wrote the report. B Hitt and P Levine conceived and developed the key software components and modelling procedures concerning biological states used in the study and assisted in the preparation of the paper. E Kohn, D Fishman, C Simone, and G Mills designed and wrote the participants section, provided serum sets used in the study, and assisted in the preparation of the paper. A Ardekani generated mass spectra and was responsible for archiving of serum samples. V Fusaro participated in the generation, analysis, and presentation of SELDI-TOF data. S Steinberg selected the statistical methods and did the data analysis.
We thank the National Ovarian Cancer Early Detection Program clinic at Northwestern University and the Early Detection Research Network for the facilitation of ovarian serum sample collection.
All work was supported by the US Federal Government intramural research program.
1 Ozols RF, Rubin SC, Thomas GM, Robboy SJ. Epithelial Ovarian
Cancer. In: Hoskins WJ, Perez CA, Young RC, eds. Principles and
practice of gynecologic oncology. Philadelphia: Lippincott Williams
and Wilkins, 2000: 981–1058.
2 Bast RC, Klug TL, St John E, et al. A radioimmunoassay using a
monoclonal antibody to monitor the course of epithelial ovarian
cancer. N Engl J Med 1983; 309: 883–87.
3 Menon U, Jacobs I. Tumor markers. In: Hoskins WJ, Perez CA,
Young RC, eds. Principles and practice of gynecologic oncology.
Philadelphia: Lippincott Williams & Wilkins, 2000: 165–82.
4 Menon U, Jacobs I. Recent developments in ovarian cancer
screening. Curr Opin Obstet Gynaecol 2000; 12: 39–42.
5 Jacobs IJ, Skates SJ, MacDonald N, et al. Screening for ovarian
cancer: a pilot randomised controlled trial. Lancet 1999; 353:
1207–10.
6 Cohen LS, Escobar PF, Scharm C, Glimco B, Fishman DA. Threedimensional
power Doppler ultrasound improves the diagnostic
accuracy for ovarian cancer prediction. Gynecol Oncol 2001; 82:
40–48.
7 Herbert BR, Sanchez J-C, Bini L. Two-dimensional electrophoresis:
the state of the art and future directions in proteome research. In:
Wilkens MR, Williams KL, Appel RD, Hochstrasser DF, eds.
Proteome research: new frontiers in functional genomics. New York:
Springer-Verlag, 1997: 13–30.
8 Richter R, Schulz-Knappe P, Schrader M, et al. Composition of
the peptide fraction in human blood plasma: database of circulating
human peptides. J Chromotogr B Biomed Sci Appl 1999; 726: 25–35.
9 Paweletz CP, Gillispie JW, Ornstein DK, et al. Rapid protein
display profiling of cancer progression directly from human tissue
using a protein biochip. Drug Dev Research 2000; 49: 34–42.
10 Alizadeh AA, Eisen MB, Davis RE, et al. Distinct types of diffuse
large B-cell lymphoma identified by gene expression profiling.
Nature 2000; 403: 503–11.
11 Golub TR, Slonim DK, Tamayo P, et al. Molecular classification of
cancer: class discovery and class prediction by gene expression
monitoring. Science 1999; 286: 531–37.
12 Lindahl D, Palmer J, Edenbrandt L. Myocardial SPET: artificial
neural networks describe extent and severity of perfusion defects.
Clin Physiol 1999; 19: 497–503.
13 Lapuerta P, L’Italien GJ, Paul S, Hendel RC, Leppo JA,
Fleisher LA. Neural network assessment of perioperative cardiac risk
in vascular surgery patients. Med Decis Making 1998; 18: 70–75.
14 Holland JH, ed. Adaptation in natural and artificial systems: an
introductory analysis with applications to biology, control, and
artificial intelligence, 3rd edn. Cambridge, MA: MIT Press, 1994.
15 Kohonen Y. Self-organizing formation of topologically correct
feature maps. Biological Cybernetics 1982; 43: 59–69.
16 Kohonen T. The self-organizing map. Proc Inst Electrical Electronics
Eng 1990; 78: 1464–80.
17 Cochran WG. Some methods for strengthening the common
chi-squared tests. Biometrics 1954; 10: 417–54.
18 Armitage P. Test for linear trend in proportions and frequencies.
Biometrics 1955; 11: 375–86.
19 Hollander M, Wolfe DA. Nonparametric statistical methods,
2nd edn. New York: John Wiley and Sons, 1999: 189–269.
20 Tou JT, Gonzalez R. Pattern classification by distance functions. In:
Tou JT, Gonzalez R, eds. Pattern recognition principles. Reading,
MA: Addison Weley Publishing Company, 1974: 75–109.
- 众说风云 (已有0条评论)

