
《NAR》:有助于去除不可靠基因表达谱芯片数据的算法
来自King Faisal Specialist医院及Riyadh Saudi Arabia研究中心的研究人员团队,研发出一种算法,有助于去除不可靠的基因表达谱芯片的数据。
在基因表达谱芯片中所得到的不可靠数据,多半儿是由于不同因素而导致的弱信号,但是最后却会造成可重复而错误的数据。
DNA 基因表达谱芯片是进行基因表现及分析研究的强大技术,但是容易产生许多错误的数据。在基因表达谱芯片分析中,最大的弱点是数据的不可靠性,因为弱信号强度容易被误判。这种错误的数据会导致错误的基因表现比率,而使得后续的分析出现偏差。
这项算法将可以应用予 cDNA 和oligonucleotide基因表达谱芯片,利用单一或双重数组的数据加以判定最合适的信号强烈限度,这可以得到基因表达谱芯片分析报告辨别可靠之数组数据的边界阈值。使用算法将可以可靠地分析表现比率。
这篇研究发表于5月的Nucleic Acids Research(Vol. 32:2323-25, 2004)中。
Published online 27 April 2004
Nucleic Acids Research, 2004, Vol. 32, No. 8 2323-2335
Assessment of reliability of microarray data and estimation of signal thresholds using mixture modeling
Department of Biostatistics, Epidemiology, and Scientific Computing and 1 Department of Biological and Medical Research, King Faisal Specialist Hospital and Research Center, PO Box 3354, MBC-03, Riyadh, 11211, Saudi Arabia
*To whom correspondence should be addressed. Tel: +966 1 464 7272, ext. 39211; Fax: +966 1 442 7854; Email: asyali@kfshrc.edu.sa
Received December 28, 2003; Revised March 15, 2004; Accepted March 24, 2004
DNA microarray is an important tool for the study of gene activities but the resultant data consisting of thousands of points are error-prone. A serious limitation in microarray analysis is the unreliability of the data generated from low signal intensities. Such data may produce erroneous gene expression ratios and cause unnecessary validation or post-analysis follow-up tasks. In this study, we describe an approach based on normal mixture modeling for determining optimal signal intensity thresholds to identify reliable measurements of the microarray elements and subsequently eliminate false expression ratios. We used univariate and bivariate mixture modeling to segregate the microarray data into two classes, low signal intensity and reliable signal intensity populations, and applied Bayesian decision theory to find the optimal signal thresholds. The bivariate analysis approach was found to be more accurate than the univariate approach; both approaches were superior to a conventional method when validated against a reference set of biological data that consisted of true and false gene expression data. Elimination of unreliable signal intensities in microarray data should contribute to the quality of microarray data including reproducibility and reliability of gene expression ratios.
- 众说风云 (已有1条评论)


