Running genome wide data analysis using a parallel approach on a cloud platform

Andrea Demartini, Davide Capozzi, Alberto Malovini, Riccardo Bellazzi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Hierarchical Naïve Bayes (HNB) is a multivariate classification algorithm that can be used to forecast the probability of a specific disease by analysing a set of Single Nucleotide Polymorphisms (SNPs). In this paper we present the implementation of HNB using a parallel approach based on the Map-Reduce paradigm built natively on the Hadoop framework, relying on the Amazon Cloud Infrastructure. We tested our approach on two GWAS datasets aimed at identifying the genetic bases of Type 1 (T1D) and Type 2 Diabetes (T2D). Both datasets include individual level data of 1, 900 cases and 1, 500 controls with ~ 420, 000 SNPs. For T2D the best results were obtained using the complete set of SNPs, whereas for T1D the best performances were reached using few SNPs selected through standard univariate association tests. Our cloudbased implementation allows running genome wide simulations cutting down computational time and overall infrastructure costs.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages188-192
Number of pages5
Volume9105
ISBN (Print)9783319195506
DOIs
Publication statusPublished - 2015
Event15th Conference on Artificial Intelligence in Medicine, AIME 2015 - Pavia, Italy
Duration: Jun 17 2015Jun 20 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9105
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other15th Conference on Artificial Intelligence in Medicine, AIME 2015
CountryItaly
CityPavia
Period6/17/156/20/15

Keywords

  • Cloud computing
  • Data mining algorithm
  • Genome-wide association studies
  • Map reduce

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Fingerprint Dive into the research topics of 'Running genome wide data analysis using a parallel approach on a cloud platform'. Together they form a unique fingerprint.

Cite this