Integrated systems for NGS data management and analysis: Open issues and available solutions

Valerio Bianchi, Arnaud Ceol, Alessandro G E Ogier, Stefano De Pretis, Eugenia Galeota, Kamal Kishore, Pranami Bora, Ottavio Croci, Stefano Campaner, Bruno Amati, Marco J. Morelli, Mattia Pelizzola

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

Next-generation sequencing (NGS) technologies have deeply changed our understanding of cellular processes by delivering an astonishing amount of data at affordable prices; nowadays, many biology laboratories have already accumulated a large number of sequenced samples. However, managing and analyzing these data poses new challenges, which may easily be underestimated by research groups devoid of IT and quantitative skills. In this perspective, we identify five issues that should be carefully addressed by research groups approaching NGS technologies. In particular, the five key issues to be considered concern: (1) adopting a laboratory management system (LIMS) and safeguard the resulting raw data structure in downstream analyses; (2) monitoring the flow of the data and standardizing input and output directories and file names, even when multiple analysis protocols are used on the same data; (3) ensuring complete traceability of the analysis performed; (4) enabling non-experienced users to run analyses through a graphical user interface (GUI) acting as a front-end for the pipelines; (5) relying on standard metadata to annotate the datasets, and when possible using controlled vocabularies, ideally derived from biomedical ontologies. Finally, we discuss the currently available tools in the light of these issues, and we introduce HTS-flow, a new workflow management system conceived to address the concerns we raised. HTS-flow is able to retrieve information from a LIMS database, manages data analyses through a simple GUI, outputs data in standard locations and allows the complete traceability of datasets, accompanying metadata and analysis scripts.

Original languageEnglish
Article number75
JournalFrontiers in Genetics
Volume7
Issue numberMAY
DOIs
Publication statusPublished - May 6 2016

Fingerprint

Biological Ontologies
Controlled Vocabulary
Technology
Directories
Workflow
Research
Names
Databases
Metadata
Datasets

Keywords

  • Epigenomics
  • Genomics
  • High-throughput sequencing
  • Laboratory information management system
  • Workflow management system

ASJC Scopus subject areas

  • Genetics
  • Molecular Medicine
  • Genetics(clinical)

Cite this

Integrated systems for NGS data management and analysis : Open issues and available solutions. / Bianchi, Valerio; Ceol, Arnaud; Ogier, Alessandro G E; De Pretis, Stefano; Galeota, Eugenia; Kishore, Kamal; Bora, Pranami; Croci, Ottavio; Campaner, Stefano; Amati, Bruno; Morelli, Marco J.; Pelizzola, Mattia.

In: Frontiers in Genetics, Vol. 7, No. MAY, 75, 06.05.2016.

Research output: Contribution to journalArticle

Bianchi, V, Ceol, A, Ogier, AGE, De Pretis, S, Galeota, E, Kishore, K, Bora, P, Croci, O, Campaner, S, Amati, B, Morelli, MJ & Pelizzola, M 2016, 'Integrated systems for NGS data management and analysis: Open issues and available solutions', Frontiers in Genetics, vol. 7, no. MAY, 75. https://doi.org/10.3389/fgene.2016.00075
Bianchi, Valerio ; Ceol, Arnaud ; Ogier, Alessandro G E ; De Pretis, Stefano ; Galeota, Eugenia ; Kishore, Kamal ; Bora, Pranami ; Croci, Ottavio ; Campaner, Stefano ; Amati, Bruno ; Morelli, Marco J. ; Pelizzola, Mattia. / Integrated systems for NGS data management and analysis : Open issues and available solutions. In: Frontiers in Genetics. 2016 ; Vol. 7, No. MAY.
@article{491cd204c4cc4626828ad25929a35901,
title = "Integrated systems for NGS data management and analysis: Open issues and available solutions",
abstract = "Next-generation sequencing (NGS) technologies have deeply changed our understanding of cellular processes by delivering an astonishing amount of data at affordable prices; nowadays, many biology laboratories have already accumulated a large number of sequenced samples. However, managing and analyzing these data poses new challenges, which may easily be underestimated by research groups devoid of IT and quantitative skills. In this perspective, we identify five issues that should be carefully addressed by research groups approaching NGS technologies. In particular, the five key issues to be considered concern: (1) adopting a laboratory management system (LIMS) and safeguard the resulting raw data structure in downstream analyses; (2) monitoring the flow of the data and standardizing input and output directories and file names, even when multiple analysis protocols are used on the same data; (3) ensuring complete traceability of the analysis performed; (4) enabling non-experienced users to run analyses through a graphical user interface (GUI) acting as a front-end for the pipelines; (5) relying on standard metadata to annotate the datasets, and when possible using controlled vocabularies, ideally derived from biomedical ontologies. Finally, we discuss the currently available tools in the light of these issues, and we introduce HTS-flow, a new workflow management system conceived to address the concerns we raised. HTS-flow is able to retrieve information from a LIMS database, manages data analyses through a simple GUI, outputs data in standard locations and allows the complete traceability of datasets, accompanying metadata and analysis scripts.",
keywords = "Epigenomics, Genomics, High-throughput sequencing, Laboratory information management system, Workflow management system",
author = "Valerio Bianchi and Arnaud Ceol and Ogier, {Alessandro G E} and {De Pretis}, Stefano and Eugenia Galeota and Kamal Kishore and Pranami Bora and Ottavio Croci and Stefano Campaner and Bruno Amati and Morelli, {Marco J.} and Mattia Pelizzola",
year = "2016",
month = "5",
day = "6",
doi = "10.3389/fgene.2016.00075",
language = "English",
volume = "7",
journal = "Frontiers in Genetics",
issn = "1664-8021",
publisher = "Frontiers Media S. A.",
number = "MAY",

}

TY - JOUR

T1 - Integrated systems for NGS data management and analysis

T2 - Open issues and available solutions

AU - Bianchi, Valerio

AU - Ceol, Arnaud

AU - Ogier, Alessandro G E

AU - De Pretis, Stefano

AU - Galeota, Eugenia

AU - Kishore, Kamal

AU - Bora, Pranami

AU - Croci, Ottavio

AU - Campaner, Stefano

AU - Amati, Bruno

AU - Morelli, Marco J.

AU - Pelizzola, Mattia

PY - 2016/5/6

Y1 - 2016/5/6

N2 - Next-generation sequencing (NGS) technologies have deeply changed our understanding of cellular processes by delivering an astonishing amount of data at affordable prices; nowadays, many biology laboratories have already accumulated a large number of sequenced samples. However, managing and analyzing these data poses new challenges, which may easily be underestimated by research groups devoid of IT and quantitative skills. In this perspective, we identify five issues that should be carefully addressed by research groups approaching NGS technologies. In particular, the five key issues to be considered concern: (1) adopting a laboratory management system (LIMS) and safeguard the resulting raw data structure in downstream analyses; (2) monitoring the flow of the data and standardizing input and output directories and file names, even when multiple analysis protocols are used on the same data; (3) ensuring complete traceability of the analysis performed; (4) enabling non-experienced users to run analyses through a graphical user interface (GUI) acting as a front-end for the pipelines; (5) relying on standard metadata to annotate the datasets, and when possible using controlled vocabularies, ideally derived from biomedical ontologies. Finally, we discuss the currently available tools in the light of these issues, and we introduce HTS-flow, a new workflow management system conceived to address the concerns we raised. HTS-flow is able to retrieve information from a LIMS database, manages data analyses through a simple GUI, outputs data in standard locations and allows the complete traceability of datasets, accompanying metadata and analysis scripts.

AB - Next-generation sequencing (NGS) technologies have deeply changed our understanding of cellular processes by delivering an astonishing amount of data at affordable prices; nowadays, many biology laboratories have already accumulated a large number of sequenced samples. However, managing and analyzing these data poses new challenges, which may easily be underestimated by research groups devoid of IT and quantitative skills. In this perspective, we identify five issues that should be carefully addressed by research groups approaching NGS technologies. In particular, the five key issues to be considered concern: (1) adopting a laboratory management system (LIMS) and safeguard the resulting raw data structure in downstream analyses; (2) monitoring the flow of the data and standardizing input and output directories and file names, even when multiple analysis protocols are used on the same data; (3) ensuring complete traceability of the analysis performed; (4) enabling non-experienced users to run analyses through a graphical user interface (GUI) acting as a front-end for the pipelines; (5) relying on standard metadata to annotate the datasets, and when possible using controlled vocabularies, ideally derived from biomedical ontologies. Finally, we discuss the currently available tools in the light of these issues, and we introduce HTS-flow, a new workflow management system conceived to address the concerns we raised. HTS-flow is able to retrieve information from a LIMS database, manages data analyses through a simple GUI, outputs data in standard locations and allows the complete traceability of datasets, accompanying metadata and analysis scripts.

KW - Epigenomics

KW - Genomics

KW - High-throughput sequencing

KW - Laboratory information management system

KW - Workflow management system

UR - http://www.scopus.com/inward/record.url?scp=84975282831&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84975282831&partnerID=8YFLogxK

U2 - 10.3389/fgene.2016.00075

DO - 10.3389/fgene.2016.00075

M3 - Article

AN - SCOPUS:84975282831

VL - 7

JO - Frontiers in Genetics

JF - Frontiers in Genetics

SN - 1664-8021

IS - MAY

M1 - 75

ER -