A hierarchical Naïve Bayes model for handling sample heterogeneity in classification problems: An application to tissue microarrays

Francesca Demichelis, Paolo Magni, Paolo Piergiorgi, Mark A. Rubin, Riccardo Bellazzi

Research output: Contribution to journalArticle

Abstract

Background: Uncertainty often affects molecular biology experiments and data for different reasons. Heterogeneity of gene or protein expression within the same tumor tissue is an example of biological uncertainty which should be taken into account when molecular markers are used in decision making. Tissue Microarray (TMA) experiments allow for large scale profiling of tissue biopsies, investigating protein patterns characterizing specific disease states. TMA studies deal with multiple sampling of the same patient, and therefore with multiple measurements of same protein target, to account for possible biological heterogeneity. The aim of this paper is to provide and validate a classification model taking into consideration the uncertainty associated with measuring replicate samples. Results: We propose an extension of the well-known Naïve Bayes classifier, which accounts for biological heterogeneity in a probabilistic framework, relying on Bayesian hierarchical models. The model, which can be efficiently learned from the training dataset, exploits a closed-form of classification equation, thus providing no additional computational cost with respect to the standard Naïve Bayes classifier. We validated the approach on several simulated datasets comparing its performances with the Naïve Bayes classifier. Moreover, we demonstrated that explicitly dealing with heterogeneity can improve classification accuracy on a TMA prostate cancer dataset. Conclusion: The proposed Hierarchical Naïve Bayes classifier can be conveniently applied in problems where within sample heterogeneity must be taken into account, such as TMA experiments and biological contexts where several measurements (replicates) are available for the same biological sample. The performance of the new approach is better than the standard Naïve Bayes model, in particular when the within sample heterogeneity is different in the different classes.

Original languageEnglish
Article number514
JournalBMC Bioinformatics
Volume7
DOIs
Publication statusPublished - Nov 24 2006

Fingerprint

Hierarchical Bayes
Microarrays
Microarray
Bayes Classifier
Classification Problems
Tissue
Classifiers
Uncertainty
Proteins
Protein
Model
Experiment
Bayesian Hierarchical Model
Molecular biology
Prostate Cancer
Biopsy
Molecular Biology
Experiments
Bayes
Profiling

ASJC Scopus subject areas

  • Medicine(all)
  • Structural Biology
  • Applied Mathematics

Cite this

A hierarchical Naïve Bayes model for handling sample heterogeneity in classification problems : An application to tissue microarrays. / Demichelis, Francesca; Magni, Paolo; Piergiorgi, Paolo; Rubin, Mark A.; Bellazzi, Riccardo.

In: BMC Bioinformatics, Vol. 7, 514, 24.11.2006.

Research output: Contribution to journalArticle

@article{31748817417e45d7ab25836e6b3abe57,
title = "A hierarchical Na{\"i}ve Bayes model for handling sample heterogeneity in classification problems: An application to tissue microarrays",
abstract = "Background: Uncertainty often affects molecular biology experiments and data for different reasons. Heterogeneity of gene or protein expression within the same tumor tissue is an example of biological uncertainty which should be taken into account when molecular markers are used in decision making. Tissue Microarray (TMA) experiments allow for large scale profiling of tissue biopsies, investigating protein patterns characterizing specific disease states. TMA studies deal with multiple sampling of the same patient, and therefore with multiple measurements of same protein target, to account for possible biological heterogeneity. The aim of this paper is to provide and validate a classification model taking into consideration the uncertainty associated with measuring replicate samples. Results: We propose an extension of the well-known Na{\"i}ve Bayes classifier, which accounts for biological heterogeneity in a probabilistic framework, relying on Bayesian hierarchical models. The model, which can be efficiently learned from the training dataset, exploits a closed-form of classification equation, thus providing no additional computational cost with respect to the standard Na{\"i}ve Bayes classifier. We validated the approach on several simulated datasets comparing its performances with the Na{\"i}ve Bayes classifier. Moreover, we demonstrated that explicitly dealing with heterogeneity can improve classification accuracy on a TMA prostate cancer dataset. Conclusion: The proposed Hierarchical Na{\"i}ve Bayes classifier can be conveniently applied in problems where within sample heterogeneity must be taken into account, such as TMA experiments and biological contexts where several measurements (replicates) are available for the same biological sample. The performance of the new approach is better than the standard Na{\"i}ve Bayes model, in particular when the within sample heterogeneity is different in the different classes.",
author = "Francesca Demichelis and Paolo Magni and Paolo Piergiorgi and Rubin, {Mark A.} and Riccardo Bellazzi",
year = "2006",
month = "11",
day = "24",
doi = "10.1186/1471-2105-7-514",
language = "English",
volume = "7",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central Ltd.",

}

TY - JOUR

T1 - A hierarchical Naïve Bayes model for handling sample heterogeneity in classification problems

T2 - An application to tissue microarrays

AU - Demichelis, Francesca

AU - Magni, Paolo

AU - Piergiorgi, Paolo

AU - Rubin, Mark A.

AU - Bellazzi, Riccardo

PY - 2006/11/24

Y1 - 2006/11/24

N2 - Background: Uncertainty often affects molecular biology experiments and data for different reasons. Heterogeneity of gene or protein expression within the same tumor tissue is an example of biological uncertainty which should be taken into account when molecular markers are used in decision making. Tissue Microarray (TMA) experiments allow for large scale profiling of tissue biopsies, investigating protein patterns characterizing specific disease states. TMA studies deal with multiple sampling of the same patient, and therefore with multiple measurements of same protein target, to account for possible biological heterogeneity. The aim of this paper is to provide and validate a classification model taking into consideration the uncertainty associated with measuring replicate samples. Results: We propose an extension of the well-known Naïve Bayes classifier, which accounts for biological heterogeneity in a probabilistic framework, relying on Bayesian hierarchical models. The model, which can be efficiently learned from the training dataset, exploits a closed-form of classification equation, thus providing no additional computational cost with respect to the standard Naïve Bayes classifier. We validated the approach on several simulated datasets comparing its performances with the Naïve Bayes classifier. Moreover, we demonstrated that explicitly dealing with heterogeneity can improve classification accuracy on a TMA prostate cancer dataset. Conclusion: The proposed Hierarchical Naïve Bayes classifier can be conveniently applied in problems where within sample heterogeneity must be taken into account, such as TMA experiments and biological contexts where several measurements (replicates) are available for the same biological sample. The performance of the new approach is better than the standard Naïve Bayes model, in particular when the within sample heterogeneity is different in the different classes.

AB - Background: Uncertainty often affects molecular biology experiments and data for different reasons. Heterogeneity of gene or protein expression within the same tumor tissue is an example of biological uncertainty which should be taken into account when molecular markers are used in decision making. Tissue Microarray (TMA) experiments allow for large scale profiling of tissue biopsies, investigating protein patterns characterizing specific disease states. TMA studies deal with multiple sampling of the same patient, and therefore with multiple measurements of same protein target, to account for possible biological heterogeneity. The aim of this paper is to provide and validate a classification model taking into consideration the uncertainty associated with measuring replicate samples. Results: We propose an extension of the well-known Naïve Bayes classifier, which accounts for biological heterogeneity in a probabilistic framework, relying on Bayesian hierarchical models. The model, which can be efficiently learned from the training dataset, exploits a closed-form of classification equation, thus providing no additional computational cost with respect to the standard Naïve Bayes classifier. We validated the approach on several simulated datasets comparing its performances with the Naïve Bayes classifier. Moreover, we demonstrated that explicitly dealing with heterogeneity can improve classification accuracy on a TMA prostate cancer dataset. Conclusion: The proposed Hierarchical Naïve Bayes classifier can be conveniently applied in problems where within sample heterogeneity must be taken into account, such as TMA experiments and biological contexts where several measurements (replicates) are available for the same biological sample. The performance of the new approach is better than the standard Naïve Bayes model, in particular when the within sample heterogeneity is different in the different classes.

UR - http://www.scopus.com/inward/record.url?scp=33845660791&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33845660791&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-7-514

DO - 10.1186/1471-2105-7-514

M3 - Article

C2 - 17125514

AN - SCOPUS:33845660791

VL - 7

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 514

ER -