Estimating the statistical significance of classifiers by varying the number of genes

R. Maglietta, A. Piepoli, A. D'Addabbo, R. Cotugno, G. Pesole, S. Liuni, M. Savino, M. Carella, F. Perri, N. Ancona

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a statistically well founded method to construct cancer predictors using gene expression profiles. This methodology is applied to a new microarray data set extracted from 25 patients affected by colon cancer. In particular, we answer to precise questions: how many gene expression levels are correlated with the pathology and how many are sufficient for an accurate classification? The proposed method provides answer to these questions avoiding the potential pitfalls hidden in the analysis of microarray data. We have evaluated the generalization error, estimated through the Leave-K-Out Cross Validation error, of two different classification schemes by varying the number of selected genes. We found that, Regularized Least Squares (RLS) and Support Vector Machines (SVM) classifiers, using the whole gene set, have error rates of e = 14% (p = 0.023) and e = 11% (p = 0.016) respectively. Concerning the number of genes, the performances of RLS and SVM classifiers do not change when the 74% of genes is used. The statistical significance was measured by using permutation test.

Original languageEnglish
Title of host publication2006 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2006
Pages109-110
Number of pages2
DOIs
Publication statusPublished - 2006
Event2006 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2006 - College Station, TX, United States
Duration: May 28 2006May 30 2006

Other

Other2006 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2006
CountryUnited States
CityCollege Station, TX
Period5/28/065/30/06

    Fingerprint

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology (miscellaneous)
  • Computational Theory and Mathematics
  • Computer Vision and Pattern Recognition
  • Statistics and Probability

Cite this

Maglietta, R., Piepoli, A., D'Addabbo, A., Cotugno, R., Pesole, G., Liuni, S., Savino, M., Carella, M., Perri, F., & Ancona, N. (2006). Estimating the statistical significance of classifiers by varying the number of genes. In 2006 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2006 (pp. 109-110). [4161801] https://doi.org/10.1109/GENSIPS.2006.353180