GARFIELD-NGS

Genomic vARiants FIltering by dEep Learning moDels in NGS

Viola Ravasio, Marco Ritelli, Andrea Legati, Edoardo Giacopuzzi

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Exome sequencing approach is extensively used in research and diagnostic laboratories to discover pathological variants and study genetic architecture of human diseases. However, a significant proportion of identified genetic variants are actually false positive calls, and this pose serious challenge for variants interpretation. Here, we propose a new tool named Genomic vARiants FIltering by dEep Learning moDels in NGS (GARFIELD-NGS), which rely on deep learning models to dissect false and true variants in exome sequencing experiments performed with Illumina or ION platforms. GARFIELD-NGS showed strong performances for both SNP and INDEL variants (AUC 0.71-0.98) and outperformed established hard filters. The method is robust also at low coverage down to 30X and can be applied on data generated with the recent Illumina twocolour chemistry. GARFIELD-NGS processes standard VCF file and produces a regular VCF output. Thus, it can be easily integrated in existing analysis pipeline, allowing application of different thresholds based on desired level of sensitivity and specificity. Availability and implementation: GARFIELD-NGS available at https://github.com/gedoardo83/GARFIELD-NGS.

Original languageEnglish
Pages (from-to)3038-3040
Number of pages3
JournalBioinformatics
Volume34
Issue number17
DOIs
Publication statusPublished - Sep 1 2018

Fingerprint

Exome
Sequencing
Genomics
Filtering
Learning
Dissect
Medical Genetics
False Positive
Chemistry
Area Under Curve
Specificity
Single Nucleotide Polymorphism
Diagnostics
Coverage
Proportion
Availability
Pipelines
Filter
Sensitivity and Specificity
Output

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

GARFIELD-NGS : Genomic vARiants FIltering by dEep Learning moDels in NGS. / Ravasio, Viola; Ritelli, Marco; Legati, Andrea; Giacopuzzi, Edoardo.

In: Bioinformatics, Vol. 34, No. 17, 01.09.2018, p. 3038-3040.

Research output: Contribution to journalArticle

@article{29e8b26ccb244491b5ef7ef1a4f356c3,
title = "GARFIELD-NGS: Genomic vARiants FIltering by dEep Learning moDels in NGS",
abstract = "Exome sequencing approach is extensively used in research and diagnostic laboratories to discover pathological variants and study genetic architecture of human diseases. However, a significant proportion of identified genetic variants are actually false positive calls, and this pose serious challenge for variants interpretation. Here, we propose a new tool named Genomic vARiants FIltering by dEep Learning moDels in NGS (GARFIELD-NGS), which rely on deep learning models to dissect false and true variants in exome sequencing experiments performed with Illumina or ION platforms. GARFIELD-NGS showed strong performances for both SNP and INDEL variants (AUC 0.71-0.98) and outperformed established hard filters. The method is robust also at low coverage down to 30X and can be applied on data generated with the recent Illumina twocolour chemistry. GARFIELD-NGS processes standard VCF file and produces a regular VCF output. Thus, it can be easily integrated in existing analysis pipeline, allowing application of different thresholds based on desired level of sensitivity and specificity. Availability and implementation: GARFIELD-NGS available at https://github.com/gedoardo83/GARFIELD-NGS.",
author = "Viola Ravasio and Marco Ritelli and Andrea Legati and Edoardo Giacopuzzi",
year = "2018",
month = "9",
day = "1",
doi = "10.1093/bioinformatics/bty303",
language = "English",
volume = "34",
pages = "3038--3040",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "NLM (Medline)",
number = "17",

}

TY - JOUR

T1 - GARFIELD-NGS

T2 - Genomic vARiants FIltering by dEep Learning moDels in NGS

AU - Ravasio, Viola

AU - Ritelli, Marco

AU - Legati, Andrea

AU - Giacopuzzi, Edoardo

PY - 2018/9/1

Y1 - 2018/9/1

N2 - Exome sequencing approach is extensively used in research and diagnostic laboratories to discover pathological variants and study genetic architecture of human diseases. However, a significant proportion of identified genetic variants are actually false positive calls, and this pose serious challenge for variants interpretation. Here, we propose a new tool named Genomic vARiants FIltering by dEep Learning moDels in NGS (GARFIELD-NGS), which rely on deep learning models to dissect false and true variants in exome sequencing experiments performed with Illumina or ION platforms. GARFIELD-NGS showed strong performances for both SNP and INDEL variants (AUC 0.71-0.98) and outperformed established hard filters. The method is robust also at low coverage down to 30X and can be applied on data generated with the recent Illumina twocolour chemistry. GARFIELD-NGS processes standard VCF file and produces a regular VCF output. Thus, it can be easily integrated in existing analysis pipeline, allowing application of different thresholds based on desired level of sensitivity and specificity. Availability and implementation: GARFIELD-NGS available at https://github.com/gedoardo83/GARFIELD-NGS.

AB - Exome sequencing approach is extensively used in research and diagnostic laboratories to discover pathological variants and study genetic architecture of human diseases. However, a significant proportion of identified genetic variants are actually false positive calls, and this pose serious challenge for variants interpretation. Here, we propose a new tool named Genomic vARiants FIltering by dEep Learning moDels in NGS (GARFIELD-NGS), which rely on deep learning models to dissect false and true variants in exome sequencing experiments performed with Illumina or ION platforms. GARFIELD-NGS showed strong performances for both SNP and INDEL variants (AUC 0.71-0.98) and outperformed established hard filters. The method is robust also at low coverage down to 30X and can be applied on data generated with the recent Illumina twocolour chemistry. GARFIELD-NGS processes standard VCF file and produces a regular VCF output. Thus, it can be easily integrated in existing analysis pipeline, allowing application of different thresholds based on desired level of sensitivity and specificity. Availability and implementation: GARFIELD-NGS available at https://github.com/gedoardo83/GARFIELD-NGS.

UR - http://www.scopus.com/inward/record.url?scp=85054964744&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054964744&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bty303

DO - 10.1093/bioinformatics/bty303

M3 - Article

VL - 34

SP - 3038

EP - 3040

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 17

ER -