TY - JOUR
T1 - Risk of bias in nonrandomized studies of interventions showed low inter-rater reliability and challenges in its application
AU - Minozzi, Silvia
AU - Cinquini, Michela
AU - Gianola, Silvia
AU - Castellini, Greta
AU - Gerardi, Chiara
AU - Banzi, Rita
PY - 2019/8/1
Y1 - 2019/8/1
N2 - Objective: To assess the inter-rater reliability (IRR)and usability of the risk of bias in nonrandomized studies of interventions tool (ROBINS-I). Study Design and Setting: We designed a cross-sectional study. Five raters independently applied ROBINS-I to the nonrandomized cohort studies in three systematic reviews on vaccines, opiate abuse, and rehabilitation. We calculated Fleiss' Kappa for multiple raters as a measure of IRR and discussed the application of ROBINS-I to identify difficulties and possible reasons for disagreement. Results: Thirty one studies were included (195 evaluations). IRRs were slight for overall judgment (IRR 0.06, 95% CI 0.001 to 0.12)and individual domains (from 0.04, 95% CI −0.04 to 0.12 for the domain “selection of reported results” to 0.18, 95% CI 0.10 to 0.26 for the domain “deviation from intended interventions”). Mean time to apply the tool was 27.8 minutes (SD 12.6)per study. The main difficulties were due to poor reporting of primary studies, misunderstanding of the question, translation of questions into a final judgment, and incomplete guidance. Conclusion: We found ROBINS-I difficult and demanding, even for raters with substantial expertise in systematic reviews. Calibration exercises and intensive training before its application are needed to improve reliability.
AB - Objective: To assess the inter-rater reliability (IRR)and usability of the risk of bias in nonrandomized studies of interventions tool (ROBINS-I). Study Design and Setting: We designed a cross-sectional study. Five raters independently applied ROBINS-I to the nonrandomized cohort studies in three systematic reviews on vaccines, opiate abuse, and rehabilitation. We calculated Fleiss' Kappa for multiple raters as a measure of IRR and discussed the application of ROBINS-I to identify difficulties and possible reasons for disagreement. Results: Thirty one studies were included (195 evaluations). IRRs were slight for overall judgment (IRR 0.06, 95% CI 0.001 to 0.12)and individual domains (from 0.04, 95% CI −0.04 to 0.12 for the domain “selection of reported results” to 0.18, 95% CI 0.10 to 0.26 for the domain “deviation from intended interventions”). Mean time to apply the tool was 27.8 minutes (SD 12.6)per study. The main difficulties were due to poor reporting of primary studies, misunderstanding of the question, translation of questions into a final judgment, and incomplete guidance. Conclusion: We found ROBINS-I difficult and demanding, even for raters with substantial expertise in systematic reviews. Calibration exercises and intensive training before its application are needed to improve reliability.
KW - Inter-rater reliability
KW - Nonrandomized studies
KW - Risk of bias
KW - ROBINS-I
KW - Systematic reviews
UR - http://www.scopus.com/inward/record.url?scp=85065236113&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85065236113&partnerID=8YFLogxK
U2 - 10.1016/j.jclinepi.2019.04.001
DO - 10.1016/j.jclinepi.2019.04.001
M3 - Article
C2 - 30981833
AN - SCOPUS:85065236113
VL - 112
SP - 28
EP - 35
JO - American journal of syphilis, gonorrhea, and venereal diseases
JF - American journal of syphilis, gonorrhea, and venereal diseases
SN - 0895-4356
ER -