Abstract
Original language | English |
---|---|
Journal | BMC Bioinformatics |
Volume | 19 |
Issue number | 1 |
DOIs | |
Publication status | Published - 2018 |
Keywords
- Bioinformatics
- Cluster computing
- Cost effectiveness
- Database systems
- Diagnosis
- HTTP
- Linux
- Open source software
- Pipelines
- Websites
- Additional flexibilities
- Command line software
- Comparison and analysis
- Cost-effective approach
- Parallel computing clusters
- Quality statistic
- Sample information
- Technological progress
- Quality control
- biology
- DNA sequence
- factual database
- high throughput sequencing
- human
- procedures
- Computational Biology
- Databases, Factual
- High-Throughput Nucleotide Sequencing
- Humans
- Sequence Analysis, DNA
Fingerprint
Dive into the research topics of 'VarGenius executes cohort-level DNA-seq variant calling and annotation and allows to manage the resulting data through a PostgreSQL database'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS
VarGenius executes cohort-level DNA-seq variant calling and annotation and allows to manage the resulting data through a PostgreSQL database. / Musacchia, F.; Ciolfi, A.; Mutarelli, M.; Bruselles, A.; Castello, R.; Pinelli, M.; Basu, S.; Banfi, S.; Casari, G.; Tartaglia, M.; Nigro, V.; Torella, A.; Esposito, G.; Cappuccio, G.; Mancano, G.; Maitz, S.; Brunetti-Pierri, N.; Parenti, G.; Selicorni, A.
In: BMC Bioinformatics, Vol. 19, No. 1, 2018.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - VarGenius executes cohort-level DNA-seq variant calling and annotation and allows to manage the resulting data through a PostgreSQL database
AU - Musacchia, F.
AU - Ciolfi, A.
AU - Mutarelli, M.
AU - Bruselles, A.
AU - Castello, R.
AU - Pinelli, M.
AU - Basu, S.
AU - Banfi, S.
AU - Casari, G.
AU - Tartaglia, M.
AU - Nigro, V.
AU - Torella, A.
AU - Esposito, G.
AU - Cappuccio, G.
AU - Mancano, G.
AU - Maitz, S.
AU - Brunetti-Pierri, N.
AU - Parenti, G.
AU - Selicorni, A.
N1 - Export Date: 11 April 2019 CODEN: BBMIC Correspondence Address: Musacchia, F.; Telethon Institute for Genetics and Medicine, Viale Campi Flegrei, 34, Italy; email: f.musacchia@tigem.it Funding details: GSP15001 Funding text 1: F.M., R.C., and M.P. are supported by the Telethon Undiagnosed Disease Program (GRANT number GSP15001). The same grant permitted the collection, sequencing and interpretation of samples used to test VarGenius. References: Wetterstrand, K.A., DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program (GSP), , www.genome.gov/sequencingcostsdata, Retrieved 16 Dec 2017; Hodges, E., Xuan, Z., Balija, V., Kramer, M., Molla, M.N., Smith, S.W., Middle, C.M., McCombie, W.R., Genome-wide in situexon capture for selective resequencing (2007) Nat Genet, , https://doi.org/10.1038/ng.2007.42; Gilissen, C., Hoischen, A., Brunner, H.G., Veltman, J.A., Unlocking Mendelian disease using exome sequencing (2011) Genome Biol, , https://doi.org/10.1186/gb-2011-12-9-228; Li, X., Montgomery, S.B., Detection and impact of rare regulatory variants in human disease (2013) Front Genet, , https://doi.org/10.3389/fgene.2013.00067; Ward, L.D., Kellis, M., Interpreting noncoding genetic variation in complex traits and human disease (2012) Nat Biotechnol, , https://doi.org/10.1038/nbt.2422; Choi, M., Scholl, U.I., Ji, W., Liu, T., Tikhonova, I.R., Zumbo, P., Nayir, A., Lifton, R.P., Genetic diagnosis by whole exome capture and massively parallel DNA sequencing (2009) Proc Natl Acad Sci U S A, p. 2009. , https://doi.org/10.1073/pnas.0910672106; Girisha, K.M., Shukla, A., Trujillano, D., Bhavani, G.S., Hebbar, M., Kadavigere, R., Rolfs, A., A homozygous nonsense variant in IFT52 is associated with a human skeletal ciliopathy (2016) Clin Genet, , https://doi.org/10.1111/cge.12762; Levy, S.E., Myers, R.M., GG17CH05-Levy Advancements in Next-Generation Sequencing (2016) Annu Rev Genomics Hum Genet, , https://doi.org/10.1146/annurev-genom-083115-022413; Gilissen, C., Hoischen, A., Brunner, H.G., Veltman, J.A., Disease gene identification strategies for exome sequencing (2012) Eur J Hum Genet., , https://doi.org/10.1038/ejhg.2011.258; Editorial ExAC project pins down rare gene variants (2016) Nature Editorial, , https://doi.org/10.1038/536249a; Higasa, K., Miyake, N., Yoshimura, J., Okamura, K., Niihori, T., Saitsu, H., Doi, K., Matsuda, F., Human genetic variation database, a reference database of genetic variations in the Japanese population (2016) J Hum Genet., , https://doi.org/10.1038/jhg.2016.12; Sherry, S.T., Ward, M.H., Kholodov, M., Baker, J., Phan, L., Smigielski, E.M., Sirotkin, K., dbSNP: the NCBI database of genetic variation (2001) Nucleic Acids Res, 29 (1), pp. 308-311; Menon, R., Patel, N.V., Mohapatra, A., Joshi, C.G., VDAP-GUI: a user-friendly pipeline for variant discovery and annotation of raw next-generation sequencing data (2016) Biotech, (3). , https://doi.org/10.1007/s13205-016-0382-1; Lam, H.Y.K., Pan, C., Clark, M.J., Lacroute, P., Chen, R., Haraksingh, R., O'Huallachain, M., Snyder, M., Detecting and annotating genetic variations using the HugeSeq pipeline (2012) Nat Biotechnol, , https://doi.org/10.1038/nbt.2134; Li, H., Durbin, R., Fast and accurate long-read alignment with burrows-wheeler transform (2010) Bioinformatics, , https://doi.org/10.1093/bioinformatics/btp698; Van der Auwera, G.A., Carneiro, M.O., Hartl, C., Poplin, R., del Angel, G., Levy-Moonshine, A., Jordan, T., DePristo, M.A., From FASTQ data to high-confidence variant calls: the genome analysis toolkit best practices pipeline (2013) Curr Protoc Bioinformatics, , https://doi.org/10.1002/0471250953.bi1110s43; Fischer, M., Snajder, R., Pabinger, S., Dander, A., Schossig, A., Zschocke, J., Trajanoski, Z., Stocker, G., SIMPLEX: cloud-enabled pipeline for the comprehensive analysis of exome sequencing data (2012) PLoS One, , https://doi.org/10.1371/journal.pone.0041948; Paila, U., Chapman, B.A., Kirchner, R., Quinlan, A.R., GEMINI: integrative exploration of genetic variation and genome annotations (2013) PLoS Comput Biol, , https://doi.org/10.1371/journal.pcbi.1003153; Rubio-Camarillo, M., López-Fernández, H., Gómez-López, G., Carro, A., Fernández, J.M., Torre, C.F., Fdez-Riverola, F., Glez-Peña, D., RUbioSeq+: a multiplatform application that executes parallelized pipelines to analyse next-generation sequencing data (2017) Comput Methods Prog Biomed, , https://doi.org/10.1016/j.cmpb.2016.10.008; Mutarelli, M., Marwah, V., Rispoli, R., Carrella, D., Dharmalingam, G., Oliva, G., di Bernardo, D., A community-based resource for automatic exome variant-calling and annotation in Mendelian disorders (2014) BMC Genomics, , https://doi.org/10.1186/1471-2164-15-S3-S5; D'Antonio, M., De D'Onorio Meo, P., Paoletti, D., Elmi, B., Pallocca, M., Sanna, N., Picardi, E., Castrignanò, T., WEP: a high-performance analysis pipeline for whole-exome data (2013) BMC Bioinformatics, , https://doi.org/10.1186/1471-2105-14-S7-S11; Karczewski, K.J., Fernald, G.H., Martin, A.R., Snyder, M., Tatonetti, N.P., Dudley, J.T., STORMSeq: an open-source, user-friendly pipeline for processing personal genomics data in the cloud (2014) PLoS One, , https://doi.org/10.1371/journal.pone.0084860; Blankenberg, D., Von Kuster, G., Coraor, N., Ananda, G., Lazarus, R., Mangan, M., Nekrutenko, A., Taylor, J., Galaxy, a web-based genome analysis tool for experimentalists (2010) Curr Protoc Mol Biol, , https://doi.org/10.1002/0471142727.mb1910s89; FASTQC: A quality control tool for high throughput sequence data, , https://www.bioinformatics.babraham.ac.uk/projects/fastqc/, Retrieved 16 Dec 2017; Bolger, A.M., Lohse, M., Usadel, B., Trimmomatic: a flexible trimmer for Illumina sequence data (2014) Bioinformatics, 30 (15), pp. 2114-2120. , https://doi.org/10.1093/bioinformatics/btu170; https://www.ncbi.nlm.nih.gov/assembly/2758, Retrieved October 2018; Wang, K., Li, M., Hakonarson, H., ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data (2010) Nucleic Acids Res, , https://doi.org/10.1093/nar/gkq603; Kircher, M., Witten, D.M., Jain, P., O'Roak, B.J., Cooper, G.M., Shendure, J., A general framework for estimating the relative pathogenicity of human genetic variants (2014) Nat Genet, , https://doi.org/10.1038/ng.2892; Liu, X., Jian, X., Boerwinkle, E., dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions (2011) Hum Mutat, , https://doi.org/10.1002/humu.21517; Quang, D., Chen, Y., Xie, X., DANN: a deep learning approach for annotating the pathogenicity of genetic variants (2015) Bioinformatics, , https://doi.org/10.1093/bioinformatics/btu703; Auton, A., Abecasis, G.R., Altshuler, D.M., Durbin, R.M., Abecasis, G.R., Bentley, D.R., Abecasis, G.R., A global reference for human genetic variation (2015) Nature, 526 (7571), pp. 68-74; Lek, M., Karczewski, K.J., Minikel, E.V., Samocha, K.E., Banks, E., Fennell, T., O'Donnell-Luria, A.H., Thomas, B.P., Analysis of protein-coding genetic variation in 60,706 humans (2016) Nature, , https://doi.org/10.1038/nature19057; Agarwala, V., Flannick, J., Sunyaev, S., GoT2D Consortium & Altshuler D (2013) Evaluating empirical bounds on complex disease genetic architecture. Nat Genet, , https://doi.org/10.1038/ng.2804; Petrovski, S., Wang, Q., Heinzen, E.L., Allen, A.S., Goldstein, D.B., Genic intolerance to functional variation and the interpretation of personal genomes (2013) PLoS Genet, , https://doi.org/10.1371/journal.pgen.100370932; Itan, Y., Shang, L., Boisson, B., Patin, E., Bolze, A., Moncada-Vélez, M., Scott, E., Casanova, J.L., The human gene damage index as a gene-level approach to prioritizing exome variants (2015) Proc Natl Acad Sci U S A, , https://doi.org/10.1073/pnas.1518646112; Thorvaldsdóttir, H., Robinson, J.T., Mesirov, J.P., Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration (2013) Brief Bioinform, , https://doi.org/10.1093/bib/bbs017; Zook, J.M., Chapman, B., Wang, J., Mittelman, D., Hofmann, O., Hide, W., Salit, M., Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls (2014) Nat Biotechnol, , https://doi.org/10.1038/nbt.2835; Danecek, P., Auton, A., Abecasis, G., Albers, C.A., Banks, E., DePristo, M.A., Handsaker, R.E., Durbin, R., The variant call format and VCFtools (2017) Bioinformatics, , https://doi.org/10.1093/bioinformatics/btr330
PY - 2018
Y1 - 2018
N2 - Background: Targeted resequencing has become the most used and cost-effective approach for identifying causative mutations of Mendelian diseases both for diagnostics and research purposes. Due to very rapid technological progress, NGS laboratories are expanding their capabilities to address the increasing number of analyses. Several open source tools are available to build a generic variant calling pipeline, but a tool able to simultaneously execute multiple analyses, organize, and categorize the samples is still missing. Results: Here we describe VarGenius, a Linux based command line software able to execute customizable pipelines for the analysis of multiple targeted resequencing data using parallel computing. VarGenius provides a database to store the output of the analysis (calling quality statistics, variant annotations, internal allelic variant frequencies) and sample information (personal data, genotypes, phenotypes). VarGenius can also perform the "joint analysis" of hundreds of samples with a single command, drastically reducing the time for the configuration and execution of the analysis. VarGenius executes the standard pipeline of the Genome Analysis Tool-Kit (GATK) best practices (GBP) for germinal variant calling, annotates the variants using Annovar, and generates a user-friendly output displaying the results through a web page. VarGenius has been tested on a parallel computing cluster with 52 machines with 120GB of RAM each. Under this configuration, a 50 M whole exome sequencing (WES) analysis for a family was executed in about 7h (trio or quartet); a joint analysis of 30 WES in about 24 h and the parallel analysis of 34 single samples from a 1 M panel in about 2 h. Conclusions: We developed VarGenius, a "master" tool that faces the increasing demand of heterogeneous NGS analyses and allows maximum flexibility for downstream analyses. It paves the way to a different kind of analysis, centered on cohorts rather than on singleton. Patient and variant information are stored into the database and any output file can be accessed programmatically. VarGenius can be used for routine analyses by biomedical researchers with basic Linux skills providing additional flexibility for computational biologists to develop their own algorithms for the comparison and analysis of data. The software is freely available at: https://github.com/frankMusacchia/VarGenius © 2018 The Author(s).
AB - Background: Targeted resequencing has become the most used and cost-effective approach for identifying causative mutations of Mendelian diseases both for diagnostics and research purposes. Due to very rapid technological progress, NGS laboratories are expanding their capabilities to address the increasing number of analyses. Several open source tools are available to build a generic variant calling pipeline, but a tool able to simultaneously execute multiple analyses, organize, and categorize the samples is still missing. Results: Here we describe VarGenius, a Linux based command line software able to execute customizable pipelines for the analysis of multiple targeted resequencing data using parallel computing. VarGenius provides a database to store the output of the analysis (calling quality statistics, variant annotations, internal allelic variant frequencies) and sample information (personal data, genotypes, phenotypes). VarGenius can also perform the "joint analysis" of hundreds of samples with a single command, drastically reducing the time for the configuration and execution of the analysis. VarGenius executes the standard pipeline of the Genome Analysis Tool-Kit (GATK) best practices (GBP) for germinal variant calling, annotates the variants using Annovar, and generates a user-friendly output displaying the results through a web page. VarGenius has been tested on a parallel computing cluster with 52 machines with 120GB of RAM each. Under this configuration, a 50 M whole exome sequencing (WES) analysis for a family was executed in about 7h (trio or quartet); a joint analysis of 30 WES in about 24 h and the parallel analysis of 34 single samples from a 1 M panel in about 2 h. Conclusions: We developed VarGenius, a "master" tool that faces the increasing demand of heterogeneous NGS analyses and allows maximum flexibility for downstream analyses. It paves the way to a different kind of analysis, centered on cohorts rather than on singleton. Patient and variant information are stored into the database and any output file can be accessed programmatically. VarGenius can be used for routine analyses by biomedical researchers with basic Linux skills providing additional flexibility for computational biologists to develop their own algorithms for the comparison and analysis of data. The software is freely available at: https://github.com/frankMusacchia/VarGenius © 2018 The Author(s).
KW - Bioinformatics
KW - Cluster computing
KW - Cost effectiveness
KW - Database systems
KW - Diagnosis
KW - HTTP
KW - Linux
KW - Open source software
KW - Pipelines
KW - Websites
KW - Additional flexibilities
KW - Command line software
KW - Comparison and analysis
KW - Cost-effective approach
KW - Parallel computing clusters
KW - Quality statistic
KW - Sample information
KW - Technological progress
KW - Quality control
KW - biology
KW - DNA sequence
KW - factual database
KW - high throughput sequencing
KW - human
KW - procedures
KW - Computational Biology
KW - Databases, Factual
KW - High-Throughput Nucleotide Sequencing
KW - Humans
KW - Sequence Analysis, DNA
U2 - 10.1186/s12859-018-2532-4
DO - 10.1186/s12859-018-2532-4
M3 - Article
VL - 19
JO - BMC Bioinformatics
JF - BMC Bioinformatics
SN - 1471-2105
IS - 1
ER -