RNAontheBENCH: Computational and empirical resources for benchmarking RNAseq quantification and differential expression methods

Pierre Luc Germain, Alessandro Vitriolo, Antonio Adamo, Pasquale Laise, Vivek Das, Giuseppe Testa

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

RNA sequencing (RNAseq) has become the method of choice for transcriptome analysis, yet no consensus exists as to the most appropriate pipeline for its analysis, with current benchmarks suffering important limitations. Here, we address these challenges through a rich benchmarking resource harnessing (i) two RNAseq datasets including ERCC ExFold spike-ins; (ii) Nanostring measurements of a panel of 150 genes on the same samples; (iii) a set of internal, genetically-determined controls; (iv) a reanalysis of the SEQC dataset; and (v) a focus on relative quantification (i.e. across-samples). We use this resource to compare different approaches to each step of RNAseq analysis, from alignment to differential expression testing. We show that methods providing the best absolute quantification do not necessarily provide good relative quantification across samples, that count-based methods are superior for gene-level relative quantification, and that the new generation of pseudo-alignment-based software performs as well as established methods, at a fraction of the computing time. We also assess the impact of library type and size on quantification and differential expression analysis. Finally, we have created a R package and a web platform to enable the simple and streamlined application of this resource to the benchmarking of future methods.

Original languageEnglish
Pages (from-to)5054-5067
Number of pages14
JournalNucleic Acids Research
Volume44
Issue number11
DOIs
Publication statusPublished - Jun 20 2016

Fingerprint

RNA Sequence Analysis
Benchmarking
Gene Expression Profiling
Genes
Libraries
Software

ASJC Scopus subject areas

  • Genetics

Cite this

RNAontheBENCH : Computational and empirical resources for benchmarking RNAseq quantification and differential expression methods. / Germain, Pierre Luc; Vitriolo, Alessandro; Adamo, Antonio; Laise, Pasquale; Das, Vivek; Testa, Giuseppe.

In: Nucleic Acids Research, Vol. 44, No. 11, 20.06.2016, p. 5054-5067.

Research output: Contribution to journalArticle

@article{109f9a3a41344d558bf5a680efb35d3e,
title = "RNAontheBENCH: Computational and empirical resources for benchmarking RNAseq quantification and differential expression methods",
abstract = "RNA sequencing (RNAseq) has become the method of choice for transcriptome analysis, yet no consensus exists as to the most appropriate pipeline for its analysis, with current benchmarks suffering important limitations. Here, we address these challenges through a rich benchmarking resource harnessing (i) two RNAseq datasets including ERCC ExFold spike-ins; (ii) Nanostring measurements of a panel of 150 genes on the same samples; (iii) a set of internal, genetically-determined controls; (iv) a reanalysis of the SEQC dataset; and (v) a focus on relative quantification (i.e. across-samples). We use this resource to compare different approaches to each step of RNAseq analysis, from alignment to differential expression testing. We show that methods providing the best absolute quantification do not necessarily provide good relative quantification across samples, that count-based methods are superior for gene-level relative quantification, and that the new generation of pseudo-alignment-based software performs as well as established methods, at a fraction of the computing time. We also assess the impact of library type and size on quantification and differential expression analysis. Finally, we have created a R package and a web platform to enable the simple and streamlined application of this resource to the benchmarking of future methods.",
author = "Germain, {Pierre Luc} and Alessandro Vitriolo and Antonio Adamo and Pasquale Laise and Vivek Das and Giuseppe Testa",
year = "2016",
month = "6",
day = "20",
doi = "10.1093/nar/gkw448",
language = "English",
volume = "44",
pages = "5054--5067",
journal = "Nucleic Acids Research",
issn = "0305-1048",
publisher = ". This work is written by (a) US Government employee(s) and is in the public domain in the US",
number = "11",

}

TY - JOUR

T1 - RNAontheBENCH

T2 - Computational and empirical resources for benchmarking RNAseq quantification and differential expression methods

AU - Germain, Pierre Luc

AU - Vitriolo, Alessandro

AU - Adamo, Antonio

AU - Laise, Pasquale

AU - Das, Vivek

AU - Testa, Giuseppe

PY - 2016/6/20

Y1 - 2016/6/20

N2 - RNA sequencing (RNAseq) has become the method of choice for transcriptome analysis, yet no consensus exists as to the most appropriate pipeline for its analysis, with current benchmarks suffering important limitations. Here, we address these challenges through a rich benchmarking resource harnessing (i) two RNAseq datasets including ERCC ExFold spike-ins; (ii) Nanostring measurements of a panel of 150 genes on the same samples; (iii) a set of internal, genetically-determined controls; (iv) a reanalysis of the SEQC dataset; and (v) a focus on relative quantification (i.e. across-samples). We use this resource to compare different approaches to each step of RNAseq analysis, from alignment to differential expression testing. We show that methods providing the best absolute quantification do not necessarily provide good relative quantification across samples, that count-based methods are superior for gene-level relative quantification, and that the new generation of pseudo-alignment-based software performs as well as established methods, at a fraction of the computing time. We also assess the impact of library type and size on quantification and differential expression analysis. Finally, we have created a R package and a web platform to enable the simple and streamlined application of this resource to the benchmarking of future methods.

AB - RNA sequencing (RNAseq) has become the method of choice for transcriptome analysis, yet no consensus exists as to the most appropriate pipeline for its analysis, with current benchmarks suffering important limitations. Here, we address these challenges through a rich benchmarking resource harnessing (i) two RNAseq datasets including ERCC ExFold spike-ins; (ii) Nanostring measurements of a panel of 150 genes on the same samples; (iii) a set of internal, genetically-determined controls; (iv) a reanalysis of the SEQC dataset; and (v) a focus on relative quantification (i.e. across-samples). We use this resource to compare different approaches to each step of RNAseq analysis, from alignment to differential expression testing. We show that methods providing the best absolute quantification do not necessarily provide good relative quantification across samples, that count-based methods are superior for gene-level relative quantification, and that the new generation of pseudo-alignment-based software performs as well as established methods, at a fraction of the computing time. We also assess the impact of library type and size on quantification and differential expression analysis. Finally, we have created a R package and a web platform to enable the simple and streamlined application of this resource to the benchmarking of future methods.

UR - http://www.scopus.com/inward/record.url?scp=84976386635&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84976386635&partnerID=8YFLogxK

U2 - 10.1093/nar/gkw448

DO - 10.1093/nar/gkw448

M3 - Article

AN - SCOPUS:84976386635

VL - 44

SP - 5054

EP - 5067

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 0305-1048

IS - 11

ER -