TY - JOUR
T1 - Towards standardisation
T2 - comparison of five whole genome sequencing (WGS) analysis pipelines for detection of epidemiologically linked tuberculosis cases
AU - Jajou, Rana
AU - Kohl, Thomas A.
AU - Walker, Timothy
AU - Norman, Anders
AU - Cirillo, Daniela Maria
AU - Tagliani, Elisa
AU - Niemann, Stefan
AU - de Neeling, Albert
AU - Lillebaek, Troels
AU - Anthony, Richard M.
AU - van Soolingen, Dick
PY - 2019/12/1
Y1 - 2019/12/1
N2 - BackgroundWhole genome sequencing (WGS) is a reliable tool for studying tuberculosis (TB) transmission. WGS data are usually processed by custom-built analysis pipelines with little standardisation between them.AimTo compare the impact of variability of several WGS analysis pipelines used internationally to detect epidemiologically linked TB cases.MethodsFrom the Netherlands, 535 Mycobacterium tuberculosis complex (MTBC) strains from 2016 were included. Epidemiological information obtained from municipal health services was available for all mycobacterial interspersed repeat unit-variable number of tandem repeat (MIRU-VNTR) clustered cases. WGS data was analysed using five different pipelines: one core genome multilocus sequence typing (cgMLST) approach and four single nucleotide polymorphism (SNP)-based pipelines developed in Oxford, United Kingdom; Borstel, Germany; Bilthoven, the Netherlands and Copenhagen, Denmark. WGS clusters were defined using a maximum pairwise distance of 12 SNPs/alleles.ResultsThe cgMLST approach and Oxford pipeline clustered all epidemiologically linked cases, however, in the other three SNP-based pipelines one epidemiological link was missed due to insufficient coverage. In general, the genetic distances varied between pipelines, reflecting different clustering rates: the cgMLST approach clustered 92 cases, followed by 84, 83, 83 and 82 cases in the SNP-based pipelines from Copenhagen, Oxford, Borstel and Bilthoven respectively.ConclusionConcordance in ruling out epidemiological links was high between pipelines, which is an important step in the international validation of WGS data analysis. To increase accuracy in identifying TB transmission clusters, standardisation of crucial WGS criteria and creation of a reference database of representative MTBC sequences would be advisable.
AB - BackgroundWhole genome sequencing (WGS) is a reliable tool for studying tuberculosis (TB) transmission. WGS data are usually processed by custom-built analysis pipelines with little standardisation between them.AimTo compare the impact of variability of several WGS analysis pipelines used internationally to detect epidemiologically linked TB cases.MethodsFrom the Netherlands, 535 Mycobacterium tuberculosis complex (MTBC) strains from 2016 were included. Epidemiological information obtained from municipal health services was available for all mycobacterial interspersed repeat unit-variable number of tandem repeat (MIRU-VNTR) clustered cases. WGS data was analysed using five different pipelines: one core genome multilocus sequence typing (cgMLST) approach and four single nucleotide polymorphism (SNP)-based pipelines developed in Oxford, United Kingdom; Borstel, Germany; Bilthoven, the Netherlands and Copenhagen, Denmark. WGS clusters were defined using a maximum pairwise distance of 12 SNPs/alleles.ResultsThe cgMLST approach and Oxford pipeline clustered all epidemiologically linked cases, however, in the other three SNP-based pipelines one epidemiological link was missed due to insufficient coverage. In general, the genetic distances varied between pipelines, reflecting different clustering rates: the cgMLST approach clustered 92 cases, followed by 84, 83, 83 and 82 cases in the SNP-based pipelines from Copenhagen, Oxford, Borstel and Bilthoven respectively.ConclusionConcordance in ruling out epidemiological links was high between pipelines, which is an important step in the international validation of WGS data analysis. To increase accuracy in identifying TB transmission clusters, standardisation of crucial WGS criteria and creation of a reference database of representative MTBC sequences would be advisable.
KW - analysis pipelines
KW - epidemiology
KW - international
KW - TB
KW - tuberculosis
KW - Whole genome sequencing
UR - http://www.scopus.com/inward/record.url?scp=85076844525&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85076844525&partnerID=8YFLogxK
U2 - 10.2807/1560-7917.ES.2019.24.50.1900130
DO - 10.2807/1560-7917.ES.2019.24.50.1900130
M3 - Article
C2 - 31847944
AN - SCOPUS:85076844525
VL - 24
JO - Eurosurveillance
JF - Eurosurveillance
SN - 1560-7917
IS - 50
ER -