Genotyping Next Generation Sequencing (NGS) data of a diploid genome aims to assign the zygosity of identified variants through comparison with a reference genome. Current methods typically employ probabilistic models that rely on the pileup of bases at each locus and on a priori knowledge.We present a new algorithm, called Kimimila (KInetic Modeling based on InforMation theory to Infer Labels of Alleles), which is able to assign reads to alleles by using a distance geometry approach and to infer the variant genotypes accurately, without any kind of assumption. The performance of the model has been assessed on simulated and real data of the 1000 Genomes Project and the results have been compared with several commonly used genotyping methods, i.e., GATK, Samtools, VarScan, FreeBayes and Atlas2. Despite our algorithm does not make use of a priori knowledge, the percentage of correctly genotyped variants is comparable to these algorithms. Furthermore, our method allows the user to split the reads pool depending on the inferred allele origin.
ASJC Scopus subject areas
- Computer Science Applications
- Health Informatics