Testing the equivalence of translations of widely used response choice labels: Results from the IQOLA Project

Susan D. Keller, John E. Ware, Barbara Gandek, Neil K. Aaronson, Jordi Alonso, Giovanni Apolone, Jakob B. Bjorner, John Brazier, Monika Bullinger, Shunichi Fukuhara, Stein Kaasa, Alain Leplège, Robert W. Sanson-Fisher, Marianne Sullivan, Sharon Wood-Dauphinee

Research output: Contribution to journalArticlepeer-review


The similarity in meaning assigned to response choice labels from the SF-36 Health Survey (SF-36) was evaluated across countries. Convenience samples of judges (range, 10 to 117; median = 48) from 13 countries rated translations of response choice labels, using a variation of the Thurstone method of equal appearing intervals. Judges marked a point on a 10-cm line representing the magnitude of a response choice label (e.g., 'good' relative to the anchors of 'poor' and 'excellent'). Ratings were evaluated to determine the ordinal consistency of response choice labels within a response scale; the degree to which differences between adjacent response choice labels were equal interval; and the amount of variance due to response choice label, country, judge, and interaction between response choice label and country. Results confirmed the hypothesized ordering of response choice labels; the percentage of ordinal pairs ranged from 88.7% to 100% (median = 98.2%) across countries and response scales. Examination of the average magnitudes of response choice labels supported the 'quasi-interval' nature of the scales. Analysis of variance (ANOVA) results supported the generalizability of response choice magnitudes across countries; labels explained 64% to 77% of the variance in ratings, and country explained 1% to 3%. These results support the equivalence of SF-36 response choice labels across countries. Departures from the assumption of equal intervals, when observed, were similar across countries and were greatest for the two response scales that are recalibrated under standard SF-36 scoring. Results provide justification for scoring translations of individual items using standard SF-36 scoring; whether these items form the same scales in other countries as they do in the United States is evaluated with tests of scaling assumptions.

Original languageEnglish
Pages (from-to)933-944
Number of pages12
JournalJournal of Clinical Epidemiology
Issue number11
Publication statusPublished - Nov 1998


  • Categorical rating scales
  • Questionnaires
  • SF-36 Health Survey
  • Thurstone scaling
  • Translations

ASJC Scopus subject areas

  • Medicine(all)
  • Public Health, Environmental and Occupational Health
  • Epidemiology


Dive into the research topics of 'Testing the equivalence of translations of widely used response choice labels: Results from the IQOLA Project'. Together they form a unique fingerprint.

Cite this