Background: Single-center trials demonstrated moderate-substantial level of interobserver agreement in the evaluation of ultrasound (US) features of thyroid nodules. Multicenter studies on US agreement, however, are scanty, and data on intraobserver agreement are poor. Aim of the study was to assess inter- and intraobserver agreement between different thyroid centers and different specialists. Methods: A blinded analysis of 100 electronically recorded thyroid nodule US images was conducted in three large-volume thyroid centers by seven radiologists and endocrinologists. The evaluation was repeated after randomization 4 months later. The following US characteristics were evaluated: composition, echogenicity, margins, intranodular echogenic spots, vascularity, and shape. Thyroid nodules were also classified according to AACE/ACE/AME, EU-TIRADS, ATA, and ACR-TIRADS US classifications. Intra- and interobserver agreement was calculated using cross-tabulation expressed as mean Cohen's Kappa. Results: Interobserver agreement for US features: K-coefficient was 0.53 for composition, 0.47 for echogenicity, 0.46 for intranodular vascularity, and 0.33 for margins of the nodules. For echogenic foci, the K-coefficient was 0.47 for microcalcifications, 0.38 for macrocalcifications, 0.11 for the subcategory comet-tail artifacts, and 0.42 for shape. Operators resulted uncertain on hyperechoic foci definition in 16% of cases and described them as "hyperechoic foci of uncertain significance." Interobserver Cohen-K for US classification systems was 0.44 for AACE, 0.42 for ACR-TIRADS, 0.39 EU-TIRADS, and 0.34 for ATA. Intraobserver agreement: the K-coefficient for nodule US features was 0.62 for intranodular vascularity, 0.58 for composition, 0.60 for echogenicity, 0.54 for macrocalcifications, 0.55 for microcalcifications, 0.47 for comet tails, 0.39 for margins, and 0.35 for shape. Intraobserver Cohen-K for US classification systems was 0.54 for AACE, 0.49 for ACR-TIRADS, 0.38 for ATA, and 0.33 for EU-TIRADS. Conclusions: Intraobserver reproducibility for thyroid nodule US reporting and US classification systems appears fairly adequate, while the interobserver agreement between different centers is lower than that assessed in single-center trials. Reporting and rating ability of thyroid US examiners still appear not consistent. An unified lexicon of thyroid US features, a simplified method of classification, and a dedicated training in the description of thyroid US findings may increase the observers' agreement and the predictive value of US classification systems in real world practice.
|Number of pages||6|
|Journal||Thyroid : official journal of the American Thyroid Association|
|Publication status||Published - Feb 2020|