Minimum number of genes for microarray feature selection.

Elena Baralis, Giulia Bruno, Alessandro Fiori

Research output: Contribution to journalArticle

Abstract

A fundamental problem in microarray analysis is to identify relevant genes from large amounts of expression data. Feature selection aims at identifying a subset of features for building robust learning models. However, finding the optimal number of features is a challenging problem, as it is a trade off between information loss when pruning excessively and noise increase when pruning is too weak. This paper presents a novel representation of genes as strings of bits and a method which automatically selects the minimum number of genes to reach a good classification accuracy on the training set. Our method first eliminates redundant features, which do not add further information for classification, then it exploits a set covering algorithm. Preliminary experimental results on public datasets confirm the intuition of the proposed method leading to high classification accuracy.

Fingerprint

Microarrays
Feature extraction
Genes
Intuition
Microarray Analysis
Noise
Learning

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Signal Processing
  • Biomedical Engineering
  • Health Informatics

Cite this

@article{a29f75ad2bfb4b5a8e6adfdf8bf2bfba,
title = "Minimum number of genes for microarray feature selection.",
abstract = "A fundamental problem in microarray analysis is to identify relevant genes from large amounts of expression data. Feature selection aims at identifying a subset of features for building robust learning models. However, finding the optimal number of features is a challenging problem, as it is a trade off between information loss when pruning excessively and noise increase when pruning is too weak. This paper presents a novel representation of genes as strings of bits and a method which automatically selects the minimum number of genes to reach a good classification accuracy on the training set. Our method first eliminates redundant features, which do not add further information for classification, then it exploits a set covering algorithm. Preliminary experimental results on public datasets confirm the intuition of the proposed method leading to high classification accuracy.",
author = "Elena Baralis and Giulia Bruno and Alessandro Fiori",
year = "2008",
language = "English",
pages = "5692--5695",
journal = "Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference",
issn = "1557-170X",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Minimum number of genes for microarray feature selection.

AU - Baralis, Elena

AU - Bruno, Giulia

AU - Fiori, Alessandro

PY - 2008

Y1 - 2008

N2 - A fundamental problem in microarray analysis is to identify relevant genes from large amounts of expression data. Feature selection aims at identifying a subset of features for building robust learning models. However, finding the optimal number of features is a challenging problem, as it is a trade off between information loss when pruning excessively and noise increase when pruning is too weak. This paper presents a novel representation of genes as strings of bits and a method which automatically selects the minimum number of genes to reach a good classification accuracy on the training set. Our method first eliminates redundant features, which do not add further information for classification, then it exploits a set covering algorithm. Preliminary experimental results on public datasets confirm the intuition of the proposed method leading to high classification accuracy.

AB - A fundamental problem in microarray analysis is to identify relevant genes from large amounts of expression data. Feature selection aims at identifying a subset of features for building robust learning models. However, finding the optimal number of features is a challenging problem, as it is a trade off between information loss when pruning excessively and noise increase when pruning is too weak. This paper presents a novel representation of genes as strings of bits and a method which automatically selects the minimum number of genes to reach a good classification accuracy on the training set. Our method first eliminates redundant features, which do not add further information for classification, then it exploits a set covering algorithm. Preliminary experimental results on public datasets confirm the intuition of the proposed method leading to high classification accuracy.

UR - http://www.scopus.com/inward/record.url?scp=84903877411&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84903877411&partnerID=8YFLogxK

M3 - Article

C2 - 19164009

SP - 5692

EP - 5695

JO - Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference

JF - Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference

SN - 1557-170X

ER -