Published online by Cambridge University Press: 16 April 2020
Clinical trials in psychiatry rely on subjective outcome measures, where poor inter-rater reliability can negatively impact signal detection and study results. One approach to this challenge is to limit the number of raters thereby decreasing expected variance. However, sample size requirements—even those based on high reliability— often necessitate many sites. The implementation of comprehensive rater training combined with validated assessment of inter-rater reliability at study initiation and throughout the study is critical to ensure high inter-rater reliability. This study examined the effect of rater training and assessment to reduce inter-rater variance in clinical studies.
After rigorous training on the administration and scoring guidelines of the HAM-A, 286 raters independently reviewed and assessed a videotaped HAM-A interview of a GAD patient. Measures of inter-rater agreement across the pool of raters, as well as for each individual rater relative to all other raters were calculated using kappa statistics modified for situations where multiple raters assess a single subject1.
The overall level of inter-rater agreement was excellent (kappa = .889), with levels of inter-rater agreement of each individual rater relative to all other raters ranging from .514 to .930. Of the 286 raters participating, more than 97.2% (278) achieved inter-rater agreement > 0.8.
This study demonstrates that robust rater training can result in high levels of agreement between large numbers of site raters on both an overall and individual rater basis and highlights the potential benefit of excluding raters from study participation with inter-rater agreement below 0.8.
Comments
No Comments have been published for this article.