Exploration of distributional models for a novel intensity-dependent normalization procedure in censored gene expression data

Nicola Lama, Patrizia Boracchi, Elia Biganzoli

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Current gene intensity-dependent normalization methods, based on regression smoothing techniques, usually approach the two problems of reducing location bias and data rescaling without taking into account the censoring that is characteristic of certain gene expressions, produced by experimental measurement constraints or by previous normalization steps. Moreover, control of normalization procedures for balancing bias versus variance is often left to the user's experience. An approximate maximum likelihood procedure for fitting a model smoothing the dependences of log-fold gene expression differences on average gene intensities is presented. Central tendency and scaling factor are modeled by means of the B-spline smoothing technique. As an alternative to the outlier theory and robust methods, the approach presented looks for suitable distributional models, possibly generalizing the classical Gaussian and Laplacian assumptions, controlling for different types of censoring. The Bayesian information criterion is adopted for model selection. Distributional assumptions are tested using goodness-of-fit statistics and Monte Carlo evaluation. Randomization quantiles are proposed to produce normally distributed adjusted data. Three publicly available data sets are analyzed for demonstration purposes. Student's t error models reveal best performances in all of the data sets considered. More validating evidence is needed to evaluate the Asymmetric Laplace distribution, which showed interesting results in one data set.

Original languageEnglish
Pages (from-to)1906-1922
Number of pages17
JournalComputational Statistics and Data Analysis
Volume53
Issue number5
DOIs
Publication statusPublished - Mar 15 2009

Fingerprint

Censored Data
Gene Expression Data
Gene expression
Normalization
Dependent
Smoothing Techniques
Censoring
Genes
Gene Expression
Spline Smoothing
Gene
Laplace Distribution
Asymmetric Distribution
Bayesian Information Criterion
Model
Splines
Scaling Factor
Error Model
Maximum likelihood
Rescaling

ASJC Scopus subject areas

  • Computational Mathematics
  • Computational Theory and Mathematics
  • Statistics and Probability
  • Applied Mathematics

Cite this

Exploration of distributional models for a novel intensity-dependent normalization procedure in censored gene expression data. / Lama, Nicola; Boracchi, Patrizia; Biganzoli, Elia.

In: Computational Statistics and Data Analysis, Vol. 53, No. 5, 15.03.2009, p. 1906-1922.

Research output: Contribution to journalArticle

@article{1f74bb3c946442489750675651ca9425,
title = "Exploration of distributional models for a novel intensity-dependent normalization procedure in censored gene expression data",
abstract = "Current gene intensity-dependent normalization methods, based on regression smoothing techniques, usually approach the two problems of reducing location bias and data rescaling without taking into account the censoring that is characteristic of certain gene expressions, produced by experimental measurement constraints or by previous normalization steps. Moreover, control of normalization procedures for balancing bias versus variance is often left to the user's experience. An approximate maximum likelihood procedure for fitting a model smoothing the dependences of log-fold gene expression differences on average gene intensities is presented. Central tendency and scaling factor are modeled by means of the B-spline smoothing technique. As an alternative to the outlier theory and robust methods, the approach presented looks for suitable distributional models, possibly generalizing the classical Gaussian and Laplacian assumptions, controlling for different types of censoring. The Bayesian information criterion is adopted for model selection. Distributional assumptions are tested using goodness-of-fit statistics and Monte Carlo evaluation. Randomization quantiles are proposed to produce normally distributed adjusted data. Three publicly available data sets are analyzed for demonstration purposes. Student's t error models reveal best performances in all of the data sets considered. More validating evidence is needed to evaluate the Asymmetric Laplace distribution, which showed interesting results in one data set.",
author = "Nicola Lama and Patrizia Boracchi and Elia Biganzoli",
year = "2009",
month = "3",
day = "15",
doi = "10.1016/j.csda.2008.11.026",
language = "English",
volume = "53",
pages = "1906--1922",
journal = "Computational Statistics and Data Analysis",
issn = "0167-9473",
publisher = "Elsevier",
number = "5",

}

TY - JOUR

T1 - Exploration of distributional models for a novel intensity-dependent normalization procedure in censored gene expression data

AU - Lama, Nicola

AU - Boracchi, Patrizia

AU - Biganzoli, Elia

PY - 2009/3/15

Y1 - 2009/3/15

N2 - Current gene intensity-dependent normalization methods, based on regression smoothing techniques, usually approach the two problems of reducing location bias and data rescaling without taking into account the censoring that is characteristic of certain gene expressions, produced by experimental measurement constraints or by previous normalization steps. Moreover, control of normalization procedures for balancing bias versus variance is often left to the user's experience. An approximate maximum likelihood procedure for fitting a model smoothing the dependences of log-fold gene expression differences on average gene intensities is presented. Central tendency and scaling factor are modeled by means of the B-spline smoothing technique. As an alternative to the outlier theory and robust methods, the approach presented looks for suitable distributional models, possibly generalizing the classical Gaussian and Laplacian assumptions, controlling for different types of censoring. The Bayesian information criterion is adopted for model selection. Distributional assumptions are tested using goodness-of-fit statistics and Monte Carlo evaluation. Randomization quantiles are proposed to produce normally distributed adjusted data. Three publicly available data sets are analyzed for demonstration purposes. Student's t error models reveal best performances in all of the data sets considered. More validating evidence is needed to evaluate the Asymmetric Laplace distribution, which showed interesting results in one data set.

AB - Current gene intensity-dependent normalization methods, based on regression smoothing techniques, usually approach the two problems of reducing location bias and data rescaling without taking into account the censoring that is characteristic of certain gene expressions, produced by experimental measurement constraints or by previous normalization steps. Moreover, control of normalization procedures for balancing bias versus variance is often left to the user's experience. An approximate maximum likelihood procedure for fitting a model smoothing the dependences of log-fold gene expression differences on average gene intensities is presented. Central tendency and scaling factor are modeled by means of the B-spline smoothing technique. As an alternative to the outlier theory and robust methods, the approach presented looks for suitable distributional models, possibly generalizing the classical Gaussian and Laplacian assumptions, controlling for different types of censoring. The Bayesian information criterion is adopted for model selection. Distributional assumptions are tested using goodness-of-fit statistics and Monte Carlo evaluation. Randomization quantiles are proposed to produce normally distributed adjusted data. Three publicly available data sets are analyzed for demonstration purposes. Student's t error models reveal best performances in all of the data sets considered. More validating evidence is needed to evaluate the Asymmetric Laplace distribution, which showed interesting results in one data set.

UR - http://www.scopus.com/inward/record.url?scp=60649109791&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=60649109791&partnerID=8YFLogxK

U2 - 10.1016/j.csda.2008.11.026

DO - 10.1016/j.csda.2008.11.026

M3 - Article

VL - 53

SP - 1906

EP - 1922

JO - Computational Statistics and Data Analysis

JF - Computational Statistics and Data Analysis

SN - 0167-9473

IS - 5

ER -