Abstract
We present a statistically well founded method to construct cancer predictors using gene expression profiles. This methodology is applied to a new microarray data set extracted from 25 patients affected by colon cancer. In particular, we answer to precise questions: how many gene expression levels are correlated with the pathology and how many are sufficient for an accurate classification? The proposed method provides answer to these questions avoiding the potential pitfalls hidden in the analysis of microarray data. We have evaluated the generalization error, estimated through the Leave-K-Out Cross Validation error, of two different classification schemes by varying the number of selected genes. We found that, Regularized Least Squares (RLS) and Support Vector Machines (SVM) classifiers, using the whole gene set, have error rates of e = 14% (p = 0.023) and e = 11% (p = 0.016) respectively. Concerning the number of genes, the performances of RLS and SVM classifiers do not change when the 74% of genes is used. The statistical significance was measured by using permutation test.
Original language | English |
---|---|
Title of host publication | 2006 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2006 |
Pages | 109-110 |
Number of pages | 2 |
DOIs | |
Publication status | Published - 2006 |
Event | 2006 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2006 - College Station, TX, United States Duration: May 28 2006 → May 30 2006 |
Other
Other | 2006 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2006 |
---|---|
Country/Territory | United States |
City | College Station, TX |
Period | 5/28/06 → 5/30/06 |
ASJC Scopus subject areas
- Biochemistry, Genetics and Molecular Biology (miscellaneous)
- Computational Theory and Mathematics
- Computer Vision and Pattern Recognition
- Statistics and Probability