Introducing new measures of inter- And intra-rater agreement to assess the reliability of medical ground truth

Andrea Campagner, Federico Cabitza

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we present and discuss two new measures of inter- and intra-rater agreement to assess the reliability of the raters, and hence of their labeling, in multi-rater setings, which are common in the production of ground truth for machine learning models. Our proposal is more conservative of other existing agreement measures, as it considers a more articulated notion of agreement by chance, based on an empirical estimation of the precision (or reliability) of the single raters involved. We discuss the measures in light of a realistic annotation tasks that involved 13 expert radiologists in labeling the MRNet dataset.

Original languageEnglish
Title of host publicationDigital Personalized Health and Medicine - Proceedings of MIE 2020
EditorsLouise B. Pape-Haugaard, Christian Lovis, Inge Cort Madsen, Patrick Weber, Per Hostrup Nielsen, Philip Scott
PublisherIOS Press
Pages282-286
Number of pages5
ISBN (Electronic)9781643680828
DOIs
Publication statusPublished - Jun 16 2020
Event30th Medical Informatics Europe Conference, MIE 2020 - Geneva, Switzerland
Duration: Apr 28 2020May 1 2020

Publication series

NameStudies in Health Technology and Informatics
Volume270
ISSN (Print)0926-9630
ISSN (Electronic)1879-8365

Conference

Conference30th Medical Informatics Europe Conference, MIE 2020
Country/TerritorySwitzerland
CityGeneva
Period4/28/205/1/20

Keywords

  • Ground Truth
  • Inter-rater agreement
  • Machine Learning
  • Reliability

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics
  • Health Information Management

Fingerprint

Dive into the research topics of 'Introducing new measures of inter- And intra-rater agreement to assess the reliability of medical ground truth'. Together they form a unique fingerprint.

Cite this