Bayesian DNA copy number analysis

Paola M V Rancoita, Marcus Hutter, Francesco Bertoni, Ivo Kwee

Research output: Contribution to journalArticle

Abstract

Background: Some diseases, like tumors, can be related to chromosomal aberrations, leading to changes of DNA copy number. The copy number of an aberrant genome can be represented as a piecewise constant function, since it can exhibit regions of deletions or gains. Instead, in a healthy cell the copy number is two because we inherit one copy of each chromosome from each our parents. Bayesian Piecewise Constant Regression (BPCR) is a Bayesian regression method for data that are noisy observations of a piecewise constant function. The method estimates the unknown segment number, the endpoints of the segments and the value of the segment levels of the underlying piecewise constant function. The Bayesian Regression Curve (BRC) estimates the same data with a smoothing curve. However, in the original formulation, some estimators failed to properly determine the corresponding parameters. For example, the boundary estimator did not take into account the dependency among the boundaries and succeeded in estimating more than one breakpoint at the same position, losing segments. Results: We derived an improved version of the BPCR (called mBPCR) and BRC, changing the segment number estimator and the boundary estimator to enhance the fitting procedure. We also proposed an alternative estimator of the variance of the segment levels, which is useful in case of data with high noise. Using artificial data, we compared the original and the modified version of BPCR and BRC with other regression methods, showing that our improved version of BPCR generally outperformed all the others. Similar results were also observed on real data. Conclusion: We proposean improved method for DNA copy number estimation, mBPCR, which performed very well compared to previously published algorithms. In particular, mBPCR was more powerful in the detection of the true position of the breakpoints and of small aberrations in very noisy data. Hence, from a biological point of view, our method can be very useful, for example, to find targets of genomic aberrations in clinical cancer samples.

Original languageEnglish
Article number10
JournalBMC Bioinformatics
Volume10
DOIs
Publication statusPublished - Jan 8 2009

Fingerprint

Aberrations
DNA
Regression
Constant function
Estimator
Chromosomes
Aberration
DNA Copy Number Variations
Tumors
Curve
Genes
Cells
Bayes Theorem
Chromosome Aberrations
Noise
Neoplasms
Cell Count
Genome
Noisy Data
Estimate

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Structural Biology
  • Applied Mathematics

Cite this

Rancoita, P. M. V., Hutter, M., Bertoni, F., & Kwee, I. (2009). Bayesian DNA copy number analysis. BMC Bioinformatics, 10, [10]. https://doi.org/10.1186/1471-2105-10-10

Bayesian DNA copy number analysis. / Rancoita, Paola M V; Hutter, Marcus; Bertoni, Francesco; Kwee, Ivo.

In: BMC Bioinformatics, Vol. 10, 10, 08.01.2009.

Research output: Contribution to journalArticle

Rancoita, PMV, Hutter, M, Bertoni, F & Kwee, I 2009, 'Bayesian DNA copy number analysis', BMC Bioinformatics, vol. 10, 10. https://doi.org/10.1186/1471-2105-10-10
Rancoita PMV, Hutter M, Bertoni F, Kwee I. Bayesian DNA copy number analysis. BMC Bioinformatics. 2009 Jan 8;10. 10. https://doi.org/10.1186/1471-2105-10-10
Rancoita, Paola M V ; Hutter, Marcus ; Bertoni, Francesco ; Kwee, Ivo. / Bayesian DNA copy number analysis. In: BMC Bioinformatics. 2009 ; Vol. 10.
@article{03b0986d747b4f18a71a2fee325949f1,
title = "Bayesian DNA copy number analysis",
abstract = "Background: Some diseases, like tumors, can be related to chromosomal aberrations, leading to changes of DNA copy number. The copy number of an aberrant genome can be represented as a piecewise constant function, since it can exhibit regions of deletions or gains. Instead, in a healthy cell the copy number is two because we inherit one copy of each chromosome from each our parents. Bayesian Piecewise Constant Regression (BPCR) is a Bayesian regression method for data that are noisy observations of a piecewise constant function. The method estimates the unknown segment number, the endpoints of the segments and the value of the segment levels of the underlying piecewise constant function. The Bayesian Regression Curve (BRC) estimates the same data with a smoothing curve. However, in the original formulation, some estimators failed to properly determine the corresponding parameters. For example, the boundary estimator did not take into account the dependency among the boundaries and succeeded in estimating more than one breakpoint at the same position, losing segments. Results: We derived an improved version of the BPCR (called mBPCR) and BRC, changing the segment number estimator and the boundary estimator to enhance the fitting procedure. We also proposed an alternative estimator of the variance of the segment levels, which is useful in case of data with high noise. Using artificial data, we compared the original and the modified version of BPCR and BRC with other regression methods, showing that our improved version of BPCR generally outperformed all the others. Similar results were also observed on real data. Conclusion: We proposean improved method for DNA copy number estimation, mBPCR, which performed very well compared to previously published algorithms. In particular, mBPCR was more powerful in the detection of the true position of the breakpoints and of small aberrations in very noisy data. Hence, from a biological point of view, our method can be very useful, for example, to find targets of genomic aberrations in clinical cancer samples.",
author = "Rancoita, {Paola M V} and Marcus Hutter and Francesco Bertoni and Ivo Kwee",
year = "2009",
month = "1",
day = "8",
doi = "10.1186/1471-2105-10-10",
language = "English",
volume = "10",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central Ltd.",

}

TY - JOUR

T1 - Bayesian DNA copy number analysis

AU - Rancoita, Paola M V

AU - Hutter, Marcus

AU - Bertoni, Francesco

AU - Kwee, Ivo

PY - 2009/1/8

Y1 - 2009/1/8

N2 - Background: Some diseases, like tumors, can be related to chromosomal aberrations, leading to changes of DNA copy number. The copy number of an aberrant genome can be represented as a piecewise constant function, since it can exhibit regions of deletions or gains. Instead, in a healthy cell the copy number is two because we inherit one copy of each chromosome from each our parents. Bayesian Piecewise Constant Regression (BPCR) is a Bayesian regression method for data that are noisy observations of a piecewise constant function. The method estimates the unknown segment number, the endpoints of the segments and the value of the segment levels of the underlying piecewise constant function. The Bayesian Regression Curve (BRC) estimates the same data with a smoothing curve. However, in the original formulation, some estimators failed to properly determine the corresponding parameters. For example, the boundary estimator did not take into account the dependency among the boundaries and succeeded in estimating more than one breakpoint at the same position, losing segments. Results: We derived an improved version of the BPCR (called mBPCR) and BRC, changing the segment number estimator and the boundary estimator to enhance the fitting procedure. We also proposed an alternative estimator of the variance of the segment levels, which is useful in case of data with high noise. Using artificial data, we compared the original and the modified version of BPCR and BRC with other regression methods, showing that our improved version of BPCR generally outperformed all the others. Similar results were also observed on real data. Conclusion: We proposean improved method for DNA copy number estimation, mBPCR, which performed very well compared to previously published algorithms. In particular, mBPCR was more powerful in the detection of the true position of the breakpoints and of small aberrations in very noisy data. Hence, from a biological point of view, our method can be very useful, for example, to find targets of genomic aberrations in clinical cancer samples.

AB - Background: Some diseases, like tumors, can be related to chromosomal aberrations, leading to changes of DNA copy number. The copy number of an aberrant genome can be represented as a piecewise constant function, since it can exhibit regions of deletions or gains. Instead, in a healthy cell the copy number is two because we inherit one copy of each chromosome from each our parents. Bayesian Piecewise Constant Regression (BPCR) is a Bayesian regression method for data that are noisy observations of a piecewise constant function. The method estimates the unknown segment number, the endpoints of the segments and the value of the segment levels of the underlying piecewise constant function. The Bayesian Regression Curve (BRC) estimates the same data with a smoothing curve. However, in the original formulation, some estimators failed to properly determine the corresponding parameters. For example, the boundary estimator did not take into account the dependency among the boundaries and succeeded in estimating more than one breakpoint at the same position, losing segments. Results: We derived an improved version of the BPCR (called mBPCR) and BRC, changing the segment number estimator and the boundary estimator to enhance the fitting procedure. We also proposed an alternative estimator of the variance of the segment levels, which is useful in case of data with high noise. Using artificial data, we compared the original and the modified version of BPCR and BRC with other regression methods, showing that our improved version of BPCR generally outperformed all the others. Similar results were also observed on real data. Conclusion: We proposean improved method for DNA copy number estimation, mBPCR, which performed very well compared to previously published algorithms. In particular, mBPCR was more powerful in the detection of the true position of the breakpoints and of small aberrations in very noisy data. Hence, from a biological point of view, our method can be very useful, for example, to find targets of genomic aberrations in clinical cancer samples.

UR - http://www.scopus.com/inward/record.url?scp=65449186172&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=65449186172&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-10-10

DO - 10.1186/1471-2105-10-10

M3 - Article

C2 - 19133123

AN - SCOPUS:65449186172

VL - 10

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 10

ER -