Next generation sequencing of pooled samples: Guideline for variants' filtering

Santosh Anand, Eleonora Mangano, Nadia Barizzone, Roberta Bordoni, Melissa Sorosina, Ferdinando Clarelli, Lucia Corrado, Filippo Martinelli Boneschi, Sandra D'Alfonso, Gianluca De Bellis

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

Sequencing large number of individuals, which is often needed for population genetics studies, is still economically challenging despite falling costs of Next Generation Sequencing (NGS). Pool-seq is an alternative cost- and time-effective option in which DNA from several individuals is pooled for sequencing. However, pooling of DNA creates new problems and challenges for accurate variant call and allele frequency (AF) estimation. In particular, sequencing errors confound with the alleles present at low frequency in the pools possibly giving rise to false positive variants. We sequenced 996 individuals in 83 pools (12 individuals/pool) in a targeted re-sequencing experiment. We show that Pool-seq AFs are robust and reliable by comparing them with public variant databases and in-house SNP-genotyping data of individual subjects of pools. Furthermore, we propose a simple filtering guideline for the removal of spurious variants based on the Kolmogorov-Smirnov statistical test. We experimentally validated our filters by comparing Pool-seq to individual sequencing data showing that the filters remove most of the false variants while retaining majority of true variants. The proposed guideline is fairly generic in nature and could be easily applied in other Pool-seq experiments.

Original languageEnglish
Article number33735
JournalScientific Reports
Volume6
DOIs
Publication statusPublished - Sep 27 2016

Fingerprint

Guidelines
Costs and Cost Analysis
DNA
Population Genetics
Nonparametric Statistics
Gene Frequency
Single Nucleotide Polymorphism
Alleles
Databases

ASJC Scopus subject areas

  • General

Cite this

Anand, S., Mangano, E., Barizzone, N., Bordoni, R., Sorosina, M., Clarelli, F., ... De Bellis, G. (2016). Next generation sequencing of pooled samples: Guideline for variants' filtering. Scientific Reports, 6, [33735]. https://doi.org/10.1038/srep33735

Next generation sequencing of pooled samples : Guideline for variants' filtering. / Anand, Santosh; Mangano, Eleonora; Barizzone, Nadia; Bordoni, Roberta; Sorosina, Melissa; Clarelli, Ferdinando; Corrado, Lucia; Boneschi, Filippo Martinelli; D'Alfonso, Sandra; De Bellis, Gianluca.

In: Scientific Reports, Vol. 6, 33735, 27.09.2016.

Research output: Contribution to journalArticle

Anand, S, Mangano, E, Barizzone, N, Bordoni, R, Sorosina, M, Clarelli, F, Corrado, L, Boneschi, FM, D'Alfonso, S & De Bellis, G 2016, 'Next generation sequencing of pooled samples: Guideline for variants' filtering', Scientific Reports, vol. 6, 33735. https://doi.org/10.1038/srep33735
Anand S, Mangano E, Barizzone N, Bordoni R, Sorosina M, Clarelli F et al. Next generation sequencing of pooled samples: Guideline for variants' filtering. Scientific Reports. 2016 Sep 27;6. 33735. https://doi.org/10.1038/srep33735
Anand, Santosh ; Mangano, Eleonora ; Barizzone, Nadia ; Bordoni, Roberta ; Sorosina, Melissa ; Clarelli, Ferdinando ; Corrado, Lucia ; Boneschi, Filippo Martinelli ; D'Alfonso, Sandra ; De Bellis, Gianluca. / Next generation sequencing of pooled samples : Guideline for variants' filtering. In: Scientific Reports. 2016 ; Vol. 6.
@article{0696478b46404cf3be3234ffea92c2f2,
title = "Next generation sequencing of pooled samples: Guideline for variants' filtering",
abstract = "Sequencing large number of individuals, which is often needed for population genetics studies, is still economically challenging despite falling costs of Next Generation Sequencing (NGS). Pool-seq is an alternative cost- and time-effective option in which DNA from several individuals is pooled for sequencing. However, pooling of DNA creates new problems and challenges for accurate variant call and allele frequency (AF) estimation. In particular, sequencing errors confound with the alleles present at low frequency in the pools possibly giving rise to false positive variants. We sequenced 996 individuals in 83 pools (12 individuals/pool) in a targeted re-sequencing experiment. We show that Pool-seq AFs are robust and reliable by comparing them with public variant databases and in-house SNP-genotyping data of individual subjects of pools. Furthermore, we propose a simple filtering guideline for the removal of spurious variants based on the Kolmogorov-Smirnov statistical test. We experimentally validated our filters by comparing Pool-seq to individual sequencing data showing that the filters remove most of the false variants while retaining majority of true variants. The proposed guideline is fairly generic in nature and could be easily applied in other Pool-seq experiments.",
author = "Santosh Anand and Eleonora Mangano and Nadia Barizzone and Roberta Bordoni and Melissa Sorosina and Ferdinando Clarelli and Lucia Corrado and Boneschi, {Filippo Martinelli} and Sandra D'Alfonso and {De Bellis}, Gianluca",
year = "2016",
month = "9",
day = "27",
doi = "10.1038/srep33735",
language = "English",
volume = "6",
journal = "Scientific Reports",
issn = "2045-2322",
publisher = "Nature Publishing Group",

}

TY - JOUR

T1 - Next generation sequencing of pooled samples

T2 - Guideline for variants' filtering

AU - Anand, Santosh

AU - Mangano, Eleonora

AU - Barizzone, Nadia

AU - Bordoni, Roberta

AU - Sorosina, Melissa

AU - Clarelli, Ferdinando

AU - Corrado, Lucia

AU - Boneschi, Filippo Martinelli

AU - D'Alfonso, Sandra

AU - De Bellis, Gianluca

PY - 2016/9/27

Y1 - 2016/9/27

N2 - Sequencing large number of individuals, which is often needed for population genetics studies, is still economically challenging despite falling costs of Next Generation Sequencing (NGS). Pool-seq is an alternative cost- and time-effective option in which DNA from several individuals is pooled for sequencing. However, pooling of DNA creates new problems and challenges for accurate variant call and allele frequency (AF) estimation. In particular, sequencing errors confound with the alleles present at low frequency in the pools possibly giving rise to false positive variants. We sequenced 996 individuals in 83 pools (12 individuals/pool) in a targeted re-sequencing experiment. We show that Pool-seq AFs are robust and reliable by comparing them with public variant databases and in-house SNP-genotyping data of individual subjects of pools. Furthermore, we propose a simple filtering guideline for the removal of spurious variants based on the Kolmogorov-Smirnov statistical test. We experimentally validated our filters by comparing Pool-seq to individual sequencing data showing that the filters remove most of the false variants while retaining majority of true variants. The proposed guideline is fairly generic in nature and could be easily applied in other Pool-seq experiments.

AB - Sequencing large number of individuals, which is often needed for population genetics studies, is still economically challenging despite falling costs of Next Generation Sequencing (NGS). Pool-seq is an alternative cost- and time-effective option in which DNA from several individuals is pooled for sequencing. However, pooling of DNA creates new problems and challenges for accurate variant call and allele frequency (AF) estimation. In particular, sequencing errors confound with the alleles present at low frequency in the pools possibly giving rise to false positive variants. We sequenced 996 individuals in 83 pools (12 individuals/pool) in a targeted re-sequencing experiment. We show that Pool-seq AFs are robust and reliable by comparing them with public variant databases and in-house SNP-genotyping data of individual subjects of pools. Furthermore, we propose a simple filtering guideline for the removal of spurious variants based on the Kolmogorov-Smirnov statistical test. We experimentally validated our filters by comparing Pool-seq to individual sequencing data showing that the filters remove most of the false variants while retaining majority of true variants. The proposed guideline is fairly generic in nature and could be easily applied in other Pool-seq experiments.

UR - http://www.scopus.com/inward/record.url?scp=84988844613&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84988844613&partnerID=8YFLogxK

U2 - 10.1038/srep33735

DO - 10.1038/srep33735

M3 - Article

AN - SCOPUS:84988844613

VL - 6

JO - Scientific Reports

JF - Scientific Reports

SN - 2045-2322

M1 - 33735

ER -