A power law global error model for the identification of differentially expressed genes in microarray data

Norman Pavelka, Mattia Pelizzola, Caterina Vizzardelli, Monica Capozzoli, Andrea Splendiani, Francesca Granucci, Paola Ricciardi-Castagnoli

Research output: Contribution to journalArticle

Abstract

Background: High-density oligonucleotide microarray technology enables the discovery of genes that are transcriptionally modulated in different biological samples due to physiology, disease or intervention. Methods for the identification of these so-called "differentially expressed genes" (DEG) would largely benefit from a deeper knowledge of the intrinsic measurement variability. Though it is clear that variance of repeated measures is highly dependent on the average expression level of a given gene, there is still a lack of consensus on how signal reproducibility is linked to signal intensity. The aim of this study was to empirically model the variance versus mean dependence in microarray data to improve the performance of existing methods for identifying DEG. Results: In the present work we used data generated by our lab as well as publicly available data sets to show that dispersion of repeated measures depends on location of the measures themselves following a power law. This enables us to construct a power law global error model (PLGEM) that is applicable to various Affymetrix GeneChip data sets. A new DEG identification method is therefore proposed, consisting of a statistic designed to make explicit use of model-derived measurement spread estimates and a resampling-based hypothesis testing algorithm. Conclusions: The new method provides a control of the false positive rate, a good sensitivity vs. specificity trade-off and consistent results with varying number of replicates and even using single samples.

Original languageEnglish
Article number203
JournalBMC Bioinformatics
Volume5
DOIs
Publication statusPublished - Dec 17 2004

Fingerprint

Error Model
Microarrays
Microarray Data
Identification (control systems)
Power Law
Genes
Gene
Repeated Measures
Genetic Association Studies
Oligonucleotide Array Sequence Analysis
Oligonucleotides
Physiology
Reproducibility
Resampling
Hypothesis Testing
False Positive
Microarray
Specificity
Statistic
Technology

ASJC Scopus subject areas

  • Medicine(all)
  • Structural Biology
  • Applied Mathematics

Cite this

Pavelka, N., Pelizzola, M., Vizzardelli, C., Capozzoli, M., Splendiani, A., Granucci, F., & Ricciardi-Castagnoli, P. (2004). A power law global error model for the identification of differentially expressed genes in microarray data. BMC Bioinformatics, 5, [203]. https://doi.org/10.1186/1471-2105-5-203

A power law global error model for the identification of differentially expressed genes in microarray data. / Pavelka, Norman; Pelizzola, Mattia; Vizzardelli, Caterina; Capozzoli, Monica; Splendiani, Andrea; Granucci, Francesca; Ricciardi-Castagnoli, Paola.

In: BMC Bioinformatics, Vol. 5, 203, 17.12.2004.

Research output: Contribution to journalArticle

Pavelka, N, Pelizzola, M, Vizzardelli, C, Capozzoli, M, Splendiani, A, Granucci, F & Ricciardi-Castagnoli, P 2004, 'A power law global error model for the identification of differentially expressed genes in microarray data', BMC Bioinformatics, vol. 5, 203. https://doi.org/10.1186/1471-2105-5-203
Pavelka, Norman ; Pelizzola, Mattia ; Vizzardelli, Caterina ; Capozzoli, Monica ; Splendiani, Andrea ; Granucci, Francesca ; Ricciardi-Castagnoli, Paola. / A power law global error model for the identification of differentially expressed genes in microarray data. In: BMC Bioinformatics. 2004 ; Vol. 5.
@article{068977bf7e5449b5b4bb6128adf5280c,
title = "A power law global error model for the identification of differentially expressed genes in microarray data",
abstract = "Background: High-density oligonucleotide microarray technology enables the discovery of genes that are transcriptionally modulated in different biological samples due to physiology, disease or intervention. Methods for the identification of these so-called {"}differentially expressed genes{"} (DEG) would largely benefit from a deeper knowledge of the intrinsic measurement variability. Though it is clear that variance of repeated measures is highly dependent on the average expression level of a given gene, there is still a lack of consensus on how signal reproducibility is linked to signal intensity. The aim of this study was to empirically model the variance versus mean dependence in microarray data to improve the performance of existing methods for identifying DEG. Results: In the present work we used data generated by our lab as well as publicly available data sets to show that dispersion of repeated measures depends on location of the measures themselves following a power law. This enables us to construct a power law global error model (PLGEM) that is applicable to various Affymetrix GeneChip data sets. A new DEG identification method is therefore proposed, consisting of a statistic designed to make explicit use of model-derived measurement spread estimates and a resampling-based hypothesis testing algorithm. Conclusions: The new method provides a control of the false positive rate, a good sensitivity vs. specificity trade-off and consistent results with varying number of replicates and even using single samples.",
author = "Norman Pavelka and Mattia Pelizzola and Caterina Vizzardelli and Monica Capozzoli and Andrea Splendiani and Francesca Granucci and Paola Ricciardi-Castagnoli",
year = "2004",
month = "12",
day = "17",
doi = "10.1186/1471-2105-5-203",
language = "English",
volume = "5",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central Ltd.",

}

TY - JOUR

T1 - A power law global error model for the identification of differentially expressed genes in microarray data

AU - Pavelka, Norman

AU - Pelizzola, Mattia

AU - Vizzardelli, Caterina

AU - Capozzoli, Monica

AU - Splendiani, Andrea

AU - Granucci, Francesca

AU - Ricciardi-Castagnoli, Paola

PY - 2004/12/17

Y1 - 2004/12/17

N2 - Background: High-density oligonucleotide microarray technology enables the discovery of genes that are transcriptionally modulated in different biological samples due to physiology, disease or intervention. Methods for the identification of these so-called "differentially expressed genes" (DEG) would largely benefit from a deeper knowledge of the intrinsic measurement variability. Though it is clear that variance of repeated measures is highly dependent on the average expression level of a given gene, there is still a lack of consensus on how signal reproducibility is linked to signal intensity. The aim of this study was to empirically model the variance versus mean dependence in microarray data to improve the performance of existing methods for identifying DEG. Results: In the present work we used data generated by our lab as well as publicly available data sets to show that dispersion of repeated measures depends on location of the measures themselves following a power law. This enables us to construct a power law global error model (PLGEM) that is applicable to various Affymetrix GeneChip data sets. A new DEG identification method is therefore proposed, consisting of a statistic designed to make explicit use of model-derived measurement spread estimates and a resampling-based hypothesis testing algorithm. Conclusions: The new method provides a control of the false positive rate, a good sensitivity vs. specificity trade-off and consistent results with varying number of replicates and even using single samples.

AB - Background: High-density oligonucleotide microarray technology enables the discovery of genes that are transcriptionally modulated in different biological samples due to physiology, disease or intervention. Methods for the identification of these so-called "differentially expressed genes" (DEG) would largely benefit from a deeper knowledge of the intrinsic measurement variability. Though it is clear that variance of repeated measures is highly dependent on the average expression level of a given gene, there is still a lack of consensus on how signal reproducibility is linked to signal intensity. The aim of this study was to empirically model the variance versus mean dependence in microarray data to improve the performance of existing methods for identifying DEG. Results: In the present work we used data generated by our lab as well as publicly available data sets to show that dispersion of repeated measures depends on location of the measures themselves following a power law. This enables us to construct a power law global error model (PLGEM) that is applicable to various Affymetrix GeneChip data sets. A new DEG identification method is therefore proposed, consisting of a statistic designed to make explicit use of model-derived measurement spread estimates and a resampling-based hypothesis testing algorithm. Conclusions: The new method provides a control of the false positive rate, a good sensitivity vs. specificity trade-off and consistent results with varying number of replicates and even using single samples.

UR - http://www.scopus.com/inward/record.url?scp=13244292335&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=13244292335&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-5-203

DO - 10.1186/1471-2105-5-203

M3 - Article

C2 - 15606915

AN - SCOPUS:13244292335

VL - 5

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 203

ER -