The European thoracic data quality project: An Aggregate Data Quality score to measure the quality of international multi-institutional databases

Michele Salati, Pierre Emmanuel Falcoz, Herbert Decaluwe, Gaetano Rocco, Dirk Van Raemdonck, Gonzalo Varela, Alessandro Brunelli

Research output: Contribution to journalArticle


OBJECTIVES: To describe the methodology for the development of data quality metrics in multi-institutional databases, deriving a cumulative data quality score [Aggregate Data Quality score (ADQ)]. The ESTS database was used to create and apply the metrics. The Units contributing to the ESTS database were ranked for the quality of data uploaded using the ADQ. METHODS: We analysed data obtained from 96 Units contributing with at least 100 major lung resections ( January 2007 to December 2014). The Units were anonymized assigning a casual numeric code. The following metrics were developed for measuring the data quality of each Unit: (i) record Completeness (COM); rate of present variables on 16 expected variables for all the records uploaded [1 - ('null values'/total expected values for the Unit) × 100, the concept of 'null value' was defined for each variable]; (ii) record Reliability (REL); rate of consistent checks on 9 checks tested for all the records uploaded [1 - (valid controls/total possible controls for the Unit) × 100, specific reliability control queries were defined]. These two metrics were rescaled using the mean and standard deviation of the entire dataset and summed, obtaining: (iii) ADQ score: [COM rescaled + REL rescaled]; it measures the cumulative data quality of a given dataset. The ADQ was used to rank the contributors. RESULTS: The COM of ESTS database contributors varied from 98.6 to 43% and the REL from 100 to 69%. Combining the rescaled metrics, the obtained ADQ ranged between 2.67 (highest data quality) and -7.85 (lowest data quality). Comparing the rating using just the COM value to the one obtained using the ADQ, 93% of Units changed their position. The major change was the drop of 66 positions considering the ADQ list. CONCLUSIONS: We described a reproducible method for data quality assessment in clinical multi-institutional databases. The ADQ is a unique indicator able to describe data quality and to compare it among centres. It has the potential of objectively guiding projects of data quality management and improvement.

Original languageEnglish
Article numberezv385
Pages (from-to)1470-1475
Number of pages6
JournalEuropean Journal of Cardio-thoracic Surgery
Issue number5
Publication statusPublished - May 1 2016



  • Data quality
  • Database management systems
  • Quality indicators
  • Registry

ASJC Scopus subject areas

  • Cardiology and Cardiovascular Medicine
  • Surgery
  • Pulmonary and Respiratory Medicine

Cite this