Hierarchical Naive Bayes for genetic association studies

Alberto Malovini, Nicola Barbarini, Riccardo Bellazzi, Francesca De Michelis

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

Background: Genome Wide Association Studies represent powerful approaches that aim at disentangling the genetic and molecular mechanisms underlying complex traits. The usual "one-SNP-at-the-time" testing strategy cannot capture the multi-factorial nature of this kind of disorders. We propose a Hierarchical Naïve Bayes classification model for taking into account associations in SNPs data characterized by Linkage Disequilibrium. Validation shows that our model reaches classification performances superior to those obtained by the standard Naïve Bayes classifier for simulated and real datasets. Methods: In the Hierarchical Naïve Bayes implemented, the SNPs mapping to the same region of Linkage Disequilibrium are considered as "details" or "replicates" of the locus, each contributing to the overall effect of the region on the phenotype. A latent variable for each block, which models the "population" of correlated SNPs, can be then used to summarize the available information. The classification is thus performed relying on the latent variables conditional probability distributions and on the SNPs data available. Results: The developed methodology has been tested on simulated datasets, each composed by 300 cases, 300 controls and a variable number of SNPs. Our approach has been also applied to two real datasets on the genetic bases of Type 1 Diabetes and Type 2 Diabetes generated by the Wellcome Trust Case Control Consortium. Conclusions: The approach proposed in this paper, called Hierarchical Naïve Bayes, allows dealing with classification of examples for which genetic information of structurally correlated SNPs are available. It improves the Naïve Bayes performances by properly handling the within-loci variability.

Original languageEnglish
Article numberS6
JournalBMC Bioinformatics
Volume13
Issue numberSUPPL 1
DOIs
Publication statusPublished - Sep 7 2012

Fingerprint

Hierarchical Bayes
Genetic Association
Naive Bayes
Genetic Association Studies
Single Nucleotide Polymorphism
Linkage Disequilibrium
Case-control
Diabetes
Latent Variables
Medical problems
Locus
Bayes Classifier
Factorial
Bayes
Conditional probability
Conditional Distribution
Phenotype
Probability distributions
Disorder
Genome

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics
  • Structural Biology

Cite this

Hierarchical Naive Bayes for genetic association studies. / Malovini, Alberto; Barbarini, Nicola; Bellazzi, Riccardo; De Michelis, Francesca.

In: BMC Bioinformatics, Vol. 13, No. SUPPL 1, S6, 07.09.2012.

Research output: Contribution to journalArticle

Malovini, Alberto ; Barbarini, Nicola ; Bellazzi, Riccardo ; De Michelis, Francesca. / Hierarchical Naive Bayes for genetic association studies. In: BMC Bioinformatics. 2012 ; Vol. 13, No. SUPPL 1.
@article{376e70cb1c684485a171a1855dc1e5d1,
title = "Hierarchical Naive Bayes for genetic association studies",
abstract = "Background: Genome Wide Association Studies represent powerful approaches that aim at disentangling the genetic and molecular mechanisms underlying complex traits. The usual {"}one-SNP-at-the-time{"} testing strategy cannot capture the multi-factorial nature of this kind of disorders. We propose a Hierarchical Na{\"i}ve Bayes classification model for taking into account associations in SNPs data characterized by Linkage Disequilibrium. Validation shows that our model reaches classification performances superior to those obtained by the standard Na{\"i}ve Bayes classifier for simulated and real datasets. Methods: In the Hierarchical Na{\"i}ve Bayes implemented, the SNPs mapping to the same region of Linkage Disequilibrium are considered as {"}details{"} or {"}replicates{"} of the locus, each contributing to the overall effect of the region on the phenotype. A latent variable for each block, which models the {"}population{"} of correlated SNPs, can be then used to summarize the available information. The classification is thus performed relying on the latent variables conditional probability distributions and on the SNPs data available. Results: The developed methodology has been tested on simulated datasets, each composed by 300 cases, 300 controls and a variable number of SNPs. Our approach has been also applied to two real datasets on the genetic bases of Type 1 Diabetes and Type 2 Diabetes generated by the Wellcome Trust Case Control Consortium. Conclusions: The approach proposed in this paper, called Hierarchical Na{\"i}ve Bayes, allows dealing with classification of examples for which genetic information of structurally correlated SNPs are available. It improves the Na{\"i}ve Bayes performances by properly handling the within-loci variability.",
author = "Alberto Malovini and Nicola Barbarini and Riccardo Bellazzi and {De Michelis}, Francesca",
year = "2012",
month = "9",
day = "7",
doi = "10.1186/1471-2105-13-S14-S6",
language = "English",
volume = "13",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central Ltd.",
number = "SUPPL 1",

}

TY - JOUR

T1 - Hierarchical Naive Bayes for genetic association studies

AU - Malovini, Alberto

AU - Barbarini, Nicola

AU - Bellazzi, Riccardo

AU - De Michelis, Francesca

PY - 2012/9/7

Y1 - 2012/9/7

N2 - Background: Genome Wide Association Studies represent powerful approaches that aim at disentangling the genetic and molecular mechanisms underlying complex traits. The usual "one-SNP-at-the-time" testing strategy cannot capture the multi-factorial nature of this kind of disorders. We propose a Hierarchical Naïve Bayes classification model for taking into account associations in SNPs data characterized by Linkage Disequilibrium. Validation shows that our model reaches classification performances superior to those obtained by the standard Naïve Bayes classifier for simulated and real datasets. Methods: In the Hierarchical Naïve Bayes implemented, the SNPs mapping to the same region of Linkage Disequilibrium are considered as "details" or "replicates" of the locus, each contributing to the overall effect of the region on the phenotype. A latent variable for each block, which models the "population" of correlated SNPs, can be then used to summarize the available information. The classification is thus performed relying on the latent variables conditional probability distributions and on the SNPs data available. Results: The developed methodology has been tested on simulated datasets, each composed by 300 cases, 300 controls and a variable number of SNPs. Our approach has been also applied to two real datasets on the genetic bases of Type 1 Diabetes and Type 2 Diabetes generated by the Wellcome Trust Case Control Consortium. Conclusions: The approach proposed in this paper, called Hierarchical Naïve Bayes, allows dealing with classification of examples for which genetic information of structurally correlated SNPs are available. It improves the Naïve Bayes performances by properly handling the within-loci variability.

AB - Background: Genome Wide Association Studies represent powerful approaches that aim at disentangling the genetic and molecular mechanisms underlying complex traits. The usual "one-SNP-at-the-time" testing strategy cannot capture the multi-factorial nature of this kind of disorders. We propose a Hierarchical Naïve Bayes classification model for taking into account associations in SNPs data characterized by Linkage Disequilibrium. Validation shows that our model reaches classification performances superior to those obtained by the standard Naïve Bayes classifier for simulated and real datasets. Methods: In the Hierarchical Naïve Bayes implemented, the SNPs mapping to the same region of Linkage Disequilibrium are considered as "details" or "replicates" of the locus, each contributing to the overall effect of the region on the phenotype. A latent variable for each block, which models the "population" of correlated SNPs, can be then used to summarize the available information. The classification is thus performed relying on the latent variables conditional probability distributions and on the SNPs data available. Results: The developed methodology has been tested on simulated datasets, each composed by 300 cases, 300 controls and a variable number of SNPs. Our approach has been also applied to two real datasets on the genetic bases of Type 1 Diabetes and Type 2 Diabetes generated by the Wellcome Trust Case Control Consortium. Conclusions: The approach proposed in this paper, called Hierarchical Naïve Bayes, allows dealing with classification of examples for which genetic information of structurally correlated SNPs are available. It improves the Naïve Bayes performances by properly handling the within-loci variability.

UR - http://www.scopus.com/inward/record.url?scp=84875026568&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84875026568&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-13-S14-S6

DO - 10.1186/1471-2105-13-S14-S6

M3 - Article

VL - 13

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - SUPPL 1

M1 - S6

ER -