The European thoracic data quality project

An Aggregate Data Quality score to measure the quality of international multi-institutional databases

Michele Salati, Pierre Emmanuel Falcoz, Herbert Decaluwe, Gaetano Rocco, Dirk Van Raemdonck, Gonzalo Varela, Alessandro Brunelli

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

OBJECTIVES: To describe the methodology for the development of data quality metrics in multi-institutional databases, deriving a cumulative data quality score [Aggregate Data Quality score (ADQ)]. The ESTS database was used to create and apply the metrics. The Units contributing to the ESTS database were ranked for the quality of data uploaded using the ADQ. METHODS: We analysed data obtained from 96 Units contributing with at least 100 major lung resections ( January 2007 to December 2014). The Units were anonymized assigning a casual numeric code. The following metrics were developed for measuring the data quality of each Unit: (i) record Completeness (COM); rate of present variables on 16 expected variables for all the records uploaded [1 - ('null values'/total expected values for the Unit) × 100, the concept of 'null value' was defined for each variable]; (ii) record Reliability (REL); rate of consistent checks on 9 checks tested for all the records uploaded [1 - (valid controls/total possible controls for the Unit) × 100, specific reliability control queries were defined]. These two metrics were rescaled using the mean and standard deviation of the entire dataset and summed, obtaining: (iii) ADQ score: [COM rescaled + REL rescaled]; it measures the cumulative data quality of a given dataset. The ADQ was used to rank the contributors. RESULTS: The COM of ESTS database contributors varied from 98.6 to 43% and the REL from 100 to 69%. Combining the rescaled metrics, the obtained ADQ ranged between 2.67 (highest data quality) and -7.85 (lowest data quality). Comparing the rating using just the COM value to the one obtained using the ADQ, 93% of Units changed their position. The major change was the drop of 66 positions considering the ADQ list. CONCLUSIONS: We described a reproducible method for data quality assessment in clinical multi-institutional databases. The ADQ is a unique indicator able to describe data quality and to compare it among centres. It has the potential of objectively guiding projects of data quality management and improvement.

Original languageEnglish
Article numberezv385
Pages (from-to)1470-1475
Number of pages6
JournalEuropean Journal of Cardio-thoracic Surgery
Volume49
Issue number5
DOIs
Publication statusPublished - May 1 2016

Fingerprint

Thorax
Databases
Data Accuracy
Quality Improvement

Keywords

  • Data quality
  • Database management systems
  • Quality indicators
  • Registry

ASJC Scopus subject areas

  • Cardiology and Cardiovascular Medicine
  • Surgery
  • Pulmonary and Respiratory Medicine

Cite this

The European thoracic data quality project : An Aggregate Data Quality score to measure the quality of international multi-institutional databases. / Salati, Michele; Falcoz, Pierre Emmanuel; Decaluwe, Herbert; Rocco, Gaetano; Van Raemdonck, Dirk; Varela, Gonzalo; Brunelli, Alessandro.

In: European Journal of Cardio-thoracic Surgery, Vol. 49, No. 5, ezv385, 01.05.2016, p. 1470-1475.

Research output: Contribution to journalArticle

Salati, Michele ; Falcoz, Pierre Emmanuel ; Decaluwe, Herbert ; Rocco, Gaetano ; Van Raemdonck, Dirk ; Varela, Gonzalo ; Brunelli, Alessandro. / The European thoracic data quality project : An Aggregate Data Quality score to measure the quality of international multi-institutional databases. In: European Journal of Cardio-thoracic Surgery. 2016 ; Vol. 49, No. 5. pp. 1470-1475.
@article{4bd2f6ba204c4b29aadea7c15b89f274,
title = "The European thoracic data quality project: An Aggregate Data Quality score to measure the quality of international multi-institutional databases",
abstract = "OBJECTIVES: To describe the methodology for the development of data quality metrics in multi-institutional databases, deriving a cumulative data quality score [Aggregate Data Quality score (ADQ)]. The ESTS database was used to create and apply the metrics. The Units contributing to the ESTS database were ranked for the quality of data uploaded using the ADQ. METHODS: We analysed data obtained from 96 Units contributing with at least 100 major lung resections ( January 2007 to December 2014). The Units were anonymized assigning a casual numeric code. The following metrics were developed for measuring the data quality of each Unit: (i) record Completeness (COM); rate of present variables on 16 expected variables for all the records uploaded [1 - ('null values'/total expected values for the Unit) × 100, the concept of 'null value' was defined for each variable]; (ii) record Reliability (REL); rate of consistent checks on 9 checks tested for all the records uploaded [1 - (valid controls/total possible controls for the Unit) × 100, specific reliability control queries were defined]. These two metrics were rescaled using the mean and standard deviation of the entire dataset and summed, obtaining: (iii) ADQ score: [COM rescaled + REL rescaled]; it measures the cumulative data quality of a given dataset. The ADQ was used to rank the contributors. RESULTS: The COM of ESTS database contributors varied from 98.6 to 43{\%} and the REL from 100 to 69{\%}. Combining the rescaled metrics, the obtained ADQ ranged between 2.67 (highest data quality) and -7.85 (lowest data quality). Comparing the rating using just the COM value to the one obtained using the ADQ, 93{\%} of Units changed their position. The major change was the drop of 66 positions considering the ADQ list. CONCLUSIONS: We described a reproducible method for data quality assessment in clinical multi-institutional databases. The ADQ is a unique indicator able to describe data quality and to compare it among centres. It has the potential of objectively guiding projects of data quality management and improvement.",
keywords = "Data quality, Database management systems, Quality indicators, Registry",
author = "Michele Salati and Falcoz, {Pierre Emmanuel} and Herbert Decaluwe and Gaetano Rocco and {Van Raemdonck}, Dirk and Gonzalo Varela and Alessandro Brunelli",
year = "2016",
month = "5",
day = "1",
doi = "10.1093/ejcts/ezv385",
language = "English",
volume = "49",
pages = "1470--1475",
journal = "European Journal of Cardio-thoracic Surgery",
issn = "1010-7940",
publisher = "European Association for Cardio-Thoracic Surgery",
number = "5",

}

TY - JOUR

T1 - The European thoracic data quality project

T2 - An Aggregate Data Quality score to measure the quality of international multi-institutional databases

AU - Salati, Michele

AU - Falcoz, Pierre Emmanuel

AU - Decaluwe, Herbert

AU - Rocco, Gaetano

AU - Van Raemdonck, Dirk

AU - Varela, Gonzalo

AU - Brunelli, Alessandro

PY - 2016/5/1

Y1 - 2016/5/1

N2 - OBJECTIVES: To describe the methodology for the development of data quality metrics in multi-institutional databases, deriving a cumulative data quality score [Aggregate Data Quality score (ADQ)]. The ESTS database was used to create and apply the metrics. The Units contributing to the ESTS database were ranked for the quality of data uploaded using the ADQ. METHODS: We analysed data obtained from 96 Units contributing with at least 100 major lung resections ( January 2007 to December 2014). The Units were anonymized assigning a casual numeric code. The following metrics were developed for measuring the data quality of each Unit: (i) record Completeness (COM); rate of present variables on 16 expected variables for all the records uploaded [1 - ('null values'/total expected values for the Unit) × 100, the concept of 'null value' was defined for each variable]; (ii) record Reliability (REL); rate of consistent checks on 9 checks tested for all the records uploaded [1 - (valid controls/total possible controls for the Unit) × 100, specific reliability control queries were defined]. These two metrics were rescaled using the mean and standard deviation of the entire dataset and summed, obtaining: (iii) ADQ score: [COM rescaled + REL rescaled]; it measures the cumulative data quality of a given dataset. The ADQ was used to rank the contributors. RESULTS: The COM of ESTS database contributors varied from 98.6 to 43% and the REL from 100 to 69%. Combining the rescaled metrics, the obtained ADQ ranged between 2.67 (highest data quality) and -7.85 (lowest data quality). Comparing the rating using just the COM value to the one obtained using the ADQ, 93% of Units changed their position. The major change was the drop of 66 positions considering the ADQ list. CONCLUSIONS: We described a reproducible method for data quality assessment in clinical multi-institutional databases. The ADQ is a unique indicator able to describe data quality and to compare it among centres. It has the potential of objectively guiding projects of data quality management and improvement.

AB - OBJECTIVES: To describe the methodology for the development of data quality metrics in multi-institutional databases, deriving a cumulative data quality score [Aggregate Data Quality score (ADQ)]. The ESTS database was used to create and apply the metrics. The Units contributing to the ESTS database were ranked for the quality of data uploaded using the ADQ. METHODS: We analysed data obtained from 96 Units contributing with at least 100 major lung resections ( January 2007 to December 2014). The Units were anonymized assigning a casual numeric code. The following metrics were developed for measuring the data quality of each Unit: (i) record Completeness (COM); rate of present variables on 16 expected variables for all the records uploaded [1 - ('null values'/total expected values for the Unit) × 100, the concept of 'null value' was defined for each variable]; (ii) record Reliability (REL); rate of consistent checks on 9 checks tested for all the records uploaded [1 - (valid controls/total possible controls for the Unit) × 100, specific reliability control queries were defined]. These two metrics were rescaled using the mean and standard deviation of the entire dataset and summed, obtaining: (iii) ADQ score: [COM rescaled + REL rescaled]; it measures the cumulative data quality of a given dataset. The ADQ was used to rank the contributors. RESULTS: The COM of ESTS database contributors varied from 98.6 to 43% and the REL from 100 to 69%. Combining the rescaled metrics, the obtained ADQ ranged between 2.67 (highest data quality) and -7.85 (lowest data quality). Comparing the rating using just the COM value to the one obtained using the ADQ, 93% of Units changed their position. The major change was the drop of 66 positions considering the ADQ list. CONCLUSIONS: We described a reproducible method for data quality assessment in clinical multi-institutional databases. The ADQ is a unique indicator able to describe data quality and to compare it among centres. It has the potential of objectively guiding projects of data quality management and improvement.

KW - Data quality

KW - Database management systems

KW - Quality indicators

KW - Registry

UR - http://www.scopus.com/inward/record.url?scp=84965121283&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84965121283&partnerID=8YFLogxK

U2 - 10.1093/ejcts/ezv385

DO - 10.1093/ejcts/ezv385

M3 - Article

VL - 49

SP - 1470

EP - 1475

JO - European Journal of Cardio-thoracic Surgery

JF - European Journal of Cardio-thoracic Surgery

SN - 1010-7940

IS - 5

M1 - ezv385

ER -