A kinetic model-based algorithm to classify NGS short reads by their allele origin

Andrea Marinoni, Ettore Rizzo, Ivan Limongelli, Paolo Gamba, Riccardo Bellazzi

Research output: Contribution to journalArticle

Abstract

Genotyping Next Generation Sequencing (NGS) data of a diploid genome aims to assign the zygosity of identified variants through comparison with a reference genome. Current methods typically employ probabilistic models that rely on the pileup of bases at each locus and on a priori knowledge.We present a new algorithm, called Kimimila (KInetic Modeling based on InforMation theory to Infer Labels of Alleles), which is able to assign reads to alleles by using a distance geometry approach and to infer the variant genotypes accurately, without any kind of assumption. The performance of the model has been assessed on simulated and real data of the 1000 Genomes Project and the results have been compared with several commonly used genotyping methods, i.e., GATK, Samtools, VarScan, FreeBayes and Atlas2. Despite our algorithm does not make use of a priori knowledge, the percentage of correctly genotyped variants is comparable to these algorithms. Furthermore, our method allows the user to split the reads pool depending on the inferred allele origin.

Original languageEnglish
Pages (from-to)121-127
Number of pages7
JournalJournal of Biomedical Informatics
Volume53
DOIs
Publication statusPublished - Feb 1 2015

Keywords

  • Allele
  • Cluster
  • Genotyping
  • Reads
  • Sequencing
  • Variants

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics
  • Medicine(all)

Fingerprint Dive into the research topics of 'A kinetic model-based algorithm to classify NGS short reads by their allele origin'. Together they form a unique fingerprint.

  • Cite this