Comparative analysis of algorithms for whole-genome assembly of pyrosequencing data

Francesca Finotello, Enrico Lavezzo, Paolo Fontana, Denis Peruzzo, Alessandro Albiero, Luisa Barzon, Marco Falda, Barbara Di Camillo, Stefano Toppo

Research output: Contribution to journalArticle

20 Citations (Scopus)

Abstract

Next-generation sequencing technologies have fostered an unprecedented proliferation of high-throughput sequencing projects and a concomitant development of novel algorithms for the assembly of short reads. In this context, an important issue is the need of a careful assessment of the accuracy of the assembly process. Here, we review the efficiency of a panel of assemblers, specifically designed to handle data from GS FLX 454 platform, on three bacterial data sets with different characteristics in terms of reads coverage and repeats content. Our aim is to investigate their strengths and weaknesses in the reconstruction of the reference genomes. In our benchmarking, we assess assemblers' performance, quantifying and characterizing assembly gaps and errors, and evaluating their ability to solve complex genomic regions containing repeats. The final goal of this analysis is to highlight pros and cons of each method, in order to provide the final user with general criteria for the right choice of the appropriate assembly strategy, depending on the specific needs. A further aspect we have explored is the relationship between coverage of a sequencing project and quality of the obtained results. The final outcome suggests that, for a good tradeoff between costs and results, the planned genome coverage of an experiment should not exceed 20-30 ×.

Original languageEnglish
Article numberbbr063
Pages (from-to)269-280
Number of pages12
JournalBriefings in Bioinformatics
Volume13
Issue number3
DOIs
Publication statusPublished - May 2012

Fingerprint

Genes
Genome
Benchmarking
Technology
Costs and Cost Analysis
Throughput
Costs
Experiments
Datasets

Keywords

  • 454 pyrosequencing
  • Assembly algorithm assessment
  • Bacterial genome
  • Coverage

ASJC Scopus subject areas

  • Molecular Biology
  • Information Systems

Cite this

Finotello, F., Lavezzo, E., Fontana, P., Peruzzo, D., Albiero, A., Barzon, L., ... Toppo, S. (2012). Comparative analysis of algorithms for whole-genome assembly of pyrosequencing data. Briefings in Bioinformatics, 13(3), 269-280. [bbr063]. https://doi.org/10.1093/bib/bbr063

Comparative analysis of algorithms for whole-genome assembly of pyrosequencing data. / Finotello, Francesca; Lavezzo, Enrico; Fontana, Paolo; Peruzzo, Denis; Albiero, Alessandro; Barzon, Luisa; Falda, Marco; Di Camillo, Barbara; Toppo, Stefano.

In: Briefings in Bioinformatics, Vol. 13, No. 3, bbr063, 05.2012, p. 269-280.

Research output: Contribution to journalArticle

Finotello, F, Lavezzo, E, Fontana, P, Peruzzo, D, Albiero, A, Barzon, L, Falda, M, Di Camillo, B & Toppo, S 2012, 'Comparative analysis of algorithms for whole-genome assembly of pyrosequencing data', Briefings in Bioinformatics, vol. 13, no. 3, bbr063, pp. 269-280. https://doi.org/10.1093/bib/bbr063
Finotello, Francesca ; Lavezzo, Enrico ; Fontana, Paolo ; Peruzzo, Denis ; Albiero, Alessandro ; Barzon, Luisa ; Falda, Marco ; Di Camillo, Barbara ; Toppo, Stefano. / Comparative analysis of algorithms for whole-genome assembly of pyrosequencing data. In: Briefings in Bioinformatics. 2012 ; Vol. 13, No. 3. pp. 269-280.
@article{0fbc825094624831bf3a6a4cece4571c,
title = "Comparative analysis of algorithms for whole-genome assembly of pyrosequencing data",
abstract = "Next-generation sequencing technologies have fostered an unprecedented proliferation of high-throughput sequencing projects and a concomitant development of novel algorithms for the assembly of short reads. In this context, an important issue is the need of a careful assessment of the accuracy of the assembly process. Here, we review the efficiency of a panel of assemblers, specifically designed to handle data from GS FLX 454 platform, on three bacterial data sets with different characteristics in terms of reads coverage and repeats content. Our aim is to investigate their strengths and weaknesses in the reconstruction of the reference genomes. In our benchmarking, we assess assemblers' performance, quantifying and characterizing assembly gaps and errors, and evaluating their ability to solve complex genomic regions containing repeats. The final goal of this analysis is to highlight pros and cons of each method, in order to provide the final user with general criteria for the right choice of the appropriate assembly strategy, depending on the specific needs. A further aspect we have explored is the relationship between coverage of a sequencing project and quality of the obtained results. The final outcome suggests that, for a good tradeoff between costs and results, the planned genome coverage of an experiment should not exceed 20-30 ×.",
keywords = "454 pyrosequencing, Assembly algorithm assessment, Bacterial genome, Coverage",
author = "Francesca Finotello and Enrico Lavezzo and Paolo Fontana and Denis Peruzzo and Alessandro Albiero and Luisa Barzon and Marco Falda and {Di Camillo}, Barbara and Stefano Toppo",
year = "2012",
month = "5",
doi = "10.1093/bib/bbr063",
language = "English",
volume = "13",
pages = "269--280",
journal = "Briefings in Bioinformatics",
issn = "1467-5463",
publisher = "OXFORD UNIV PRESS",
number = "3",

}

TY - JOUR

T1 - Comparative analysis of algorithms for whole-genome assembly of pyrosequencing data

AU - Finotello, Francesca

AU - Lavezzo, Enrico

AU - Fontana, Paolo

AU - Peruzzo, Denis

AU - Albiero, Alessandro

AU - Barzon, Luisa

AU - Falda, Marco

AU - Di Camillo, Barbara

AU - Toppo, Stefano

PY - 2012/5

Y1 - 2012/5

N2 - Next-generation sequencing technologies have fostered an unprecedented proliferation of high-throughput sequencing projects and a concomitant development of novel algorithms for the assembly of short reads. In this context, an important issue is the need of a careful assessment of the accuracy of the assembly process. Here, we review the efficiency of a panel of assemblers, specifically designed to handle data from GS FLX 454 platform, on three bacterial data sets with different characteristics in terms of reads coverage and repeats content. Our aim is to investigate their strengths and weaknesses in the reconstruction of the reference genomes. In our benchmarking, we assess assemblers' performance, quantifying and characterizing assembly gaps and errors, and evaluating their ability to solve complex genomic regions containing repeats. The final goal of this analysis is to highlight pros and cons of each method, in order to provide the final user with general criteria for the right choice of the appropriate assembly strategy, depending on the specific needs. A further aspect we have explored is the relationship between coverage of a sequencing project and quality of the obtained results. The final outcome suggests that, for a good tradeoff between costs and results, the planned genome coverage of an experiment should not exceed 20-30 ×.

AB - Next-generation sequencing technologies have fostered an unprecedented proliferation of high-throughput sequencing projects and a concomitant development of novel algorithms for the assembly of short reads. In this context, an important issue is the need of a careful assessment of the accuracy of the assembly process. Here, we review the efficiency of a panel of assemblers, specifically designed to handle data from GS FLX 454 platform, on three bacterial data sets with different characteristics in terms of reads coverage and repeats content. Our aim is to investigate their strengths and weaknesses in the reconstruction of the reference genomes. In our benchmarking, we assess assemblers' performance, quantifying and characterizing assembly gaps and errors, and evaluating their ability to solve complex genomic regions containing repeats. The final goal of this analysis is to highlight pros and cons of each method, in order to provide the final user with general criteria for the right choice of the appropriate assembly strategy, depending on the specific needs. A further aspect we have explored is the relationship between coverage of a sequencing project and quality of the obtained results. The final outcome suggests that, for a good tradeoff between costs and results, the planned genome coverage of an experiment should not exceed 20-30 ×.

KW - 454 pyrosequencing

KW - Assembly algorithm assessment

KW - Bacterial genome

KW - Coverage

UR - http://www.scopus.com/inward/record.url?scp=82255165034&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=82255165034&partnerID=8YFLogxK

U2 - 10.1093/bib/bbr063

DO - 10.1093/bib/bbr063

M3 - Article

C2 - 22021898

AN - SCOPUS:82255165034

VL - 13

SP - 269

EP - 280

JO - Briefings in Bioinformatics

JF - Briefings in Bioinformatics

SN - 1467-5463

IS - 3

M1 - bbr063

ER -