Abstract
A fundamental problem in microarray analysis is to identify relevant genes from large amounts of expression data. Feature selection aims at identifying a subset of features for building robust learning models. However, finding the optimal number of features is a challenging problem, as it is a trade off between information loss when pruning excessively and noise increase when pruning is too weak. This paper presents a novel representation of genes as strings of bits and a method which automatically selects the minimum number of genes to reach a good classification accuracy on the training set. Our method first eliminates redundant features, which do not add further information for classification, then it exploits a set covering algorithm. Preliminary experimental results on public datasets confirm the intuition of the proposed method leading to high classification accuracy.
Original language | English |
---|---|
Title of host publication | Proceedings of the 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS'08 - "Personalized Healthcare through Technology" |
Pages | 5692-5695 |
Number of pages | 4 |
Publication status | Published - 2008 |
Event | 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS'08 - Vancouver, BC, Canada Duration: Aug 20 2008 → Aug 25 2008 |
Other
Other | 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS'08 |
---|---|
Country/Territory | Canada |
City | Vancouver, BC |
Period | 8/20/08 → 8/25/08 |
ASJC Scopus subject areas
- Computer Vision and Pattern Recognition
- Signal Processing
- Biomedical Engineering
- Health Informatics