BACKGROUNDS AND AIMS: Endoscopic outcomes are increasingly used in clinical trials and in routine practice for inflammatory bowel disease [IBD] in order to reach more objective patient evaluations than possible using only clinical features. However, reproducibility of endoscopic scoring systems used to categorize endoscopic activity has been reported to be suboptimal.The aim of this study was to analyse the inter-rated agreement of non-dedicated gastroenterologists on IBD endoscopic scoring systems, and to explore the effects of a dedicated training programme on agreement.
METHODS: A total of 237 physicians attended training courses on IBD endoscopic scoring systems, and they independently scored a set of IBD endoscopic videos for ulcerative colitis [with Mayo endoscopic subscore], post-operative Crohn's disease [with Rutgeerts score] and luminal Crohn's disease (with the Simple Endoscopic Score for Crohn's Disease [SESCD] and Crohn's Endoscopic Index of Severity [CDEIS]). A second round of scoring was collected after discussion about determinants of discrepancy. Interobserver agreement was measured by means of the Fleiss' kappa [kappa] or intraclass correlation coefficient [ICC] as appropriate.
RESULTS: The inter-rater agreement increased from kappa 0.51 (95% confidence interval [95% CI] 0.48-0.55) to 0.76 [95% CI 0.72-0.79] for the Mayo endoscopic subscore, and from 0.45 [95% CI 0.40-0.50] to 0.79 [0.74-0.83] for the Rutgeerts score before and after the training programme, respectively, and both differences were significant [P < 0.0001]. The ICC was 0.77 [95% CI 0.56-0.96] for SESCD and 0.76 [0.54- 0.96] for CDEIS, respectively, with only one measurement.
DISCUSSION: The basal inter-rater agreement of inexperienced gastroenterologists focused on IBD management is moderate; however, a dedicated training programme can significantly impact on inter-rater agreement, increasing it to levels expected among expert central reviewers.
- Journal Article