Hostname: page-component-745bb68f8f-d8cs5 Total loading time: 0 Render date: 2025-01-07T17:49:01.588Z Has data issue: false hasContentIssue false

An Alternative Interpretation of the Linearly Weighted Kappa Coefficients for Ordinal Data

Published online by Cambridge University Press:  01 January 2025

Tarald O. Kvålseth*
Affiliation:
University of Minnesota
*
Correspondence should be made to Tarald O. Kvålseth, Departments of Mechanical Engineering and Industrial andSystems Engineering, University of Minnesota, Minneapolis, MN 55455, USA. Email: [email protected]

Abstract

When two (or more) observers are independently categorizing a set of observations, Cohen’s kappa has become the most notable measure of interobserver agreement. When the categories are ordinal, a weighted form of kappa becomes desirable. The two most popular weighting schemes are the quadratic weights and linear weights. Quadratic weights have been justified by the fact that the corresponding weighted kappa is asymptotically equivalent to an intraclass correlation coefficient. This paper deals with linear weights and shows that the corresponding weighted kappa is equivalent to the unweighted kappa when cumulative probabilities are substituted for probabilities. A numerical example is provided.

Type
Original Paper
Copyright
Copyright © The Psychometric Society 2018

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agresti, A. (2013). Categorical data analysis.3Hoboken:Wiley.Google Scholar
Ben-David, A. (2008). Comparison of classification accuracy using Cohen’s weighted kappa.Expert Systems with Applications, 34,825832.CrossRefGoogle Scholar
Brenner, H., &Kliebsch, U. (1996). Dependence of weighted kappa coefficients on the number of categories.Epidemiology, 7,199202.CrossRefGoogle ScholarPubMed
Broemeling, L. D. (2009). Bayesian methods for measures of agreement.Boca Raton:Chapman and Hall/CRC.CrossRefGoogle Scholar
Cicchetti, D. V., &Allison, T. (1971). A new procedure for assessing reliability of scoring EEG sleep recordings.American Journal of EEG Technology, 11,101109.CrossRefGoogle Scholar
Cohen, J. (1960). A coefficient of agreement for nominal scales.Educational and Psychological Measurement, 20,3746.CrossRefGoogle Scholar
Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit.Psychological Bulletin, 70,213220.CrossRefGoogle ScholarPubMed
Crewson, P. E. (2005). Reader agreement studies.American Journal of Roentgenology, 184,13911397.CrossRefGoogle ScholarPubMed
Fleiss, J. L. (1981). Statistical methods for rates and proportions.2New York:Wiley.Google Scholar
Fleiss, J. L., &Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability.Educational and Psychological Measurement, 33,613619.CrossRefGoogle Scholar
Fleiss, J. L.,Cohen, J., &Everitt, B. S. (1969). Large sample standard errors of kappa and weighted kappas.Psychological Bulletin, 72,323327.CrossRefGoogle Scholar
Fleiss, J. L.,Levin, B., &Paik, M. C. (2003). Statistical methods for rates and proportions.3Hooboken:Wiley.CrossRefGoogle Scholar
Graham, P., &Jackson, R. (1993). The analysis of ordinal agreement data: Beyond weighted kappa.Journal of Clinical Epidemiology, 46,10551062.CrossRefGoogle ScholarPubMed
Gwet, K. L. (2014). Handbook of inter-rater reliability.4Gaithersburg:Advanced Analytics.Google Scholar
Kraemer, H. C. (1979). Ramifications of a population model for K as a coefficient of reliability.Psychometrika, 44,461472.CrossRefGoogle Scholar
Kundel, H. L., &Polansky, M. (2003). Measurement of observer agreement.Radiology, 228,303308.CrossRefGoogle ScholarPubMed
Kvålseth, T. O. (1989). Note on Cohen’s kappa.Psychological Reports, 65,223226.CrossRefGoogle Scholar
Kvålseth, T. O. (2003). Weighted specific-category kappa measure of interobserver agreement.Pyschological Reports, 93,12831290.CrossRefGoogle ScholarPubMed
Kvålseth, T. O. (2015). Measurement of interobserver disagreement: Correction of Cohen’s kappa for negative values.Journal of Probability and Statistics, 2015(ID751803),18.CrossRefGoogle Scholar
Li, P. (2016). A note on the linearly and quadratically weighted kappa coefficients.Psychometrika, 81,795801.CrossRefGoogle ScholarPubMed
Liebetrau, A. M. (1983). Measures of Association.Beverly Hills:Sage.CrossRefGoogle Scholar
Lin, L. I.-K. (1989). A concordance correlation coefficient to evaluate reproducibility.Biometrics, 45 255268.CrossRefGoogle ScholarPubMed
Lin, L. I.-K.,Hedayat, A. S., &Wu, W. (2012). Statistical tools for measuring agreement.New York:Springer.CrossRefGoogle Scholar
Maclure, M., &Willett, W. C. (1987). Misinterpretation and misuse of the kappa statistic.Journal of Epidemiology, 126,161169.CrossRefGoogle ScholarPubMed
Mielke, P. W.,&Berry, K. J. (2009). A note on Cohen’s weighted kappa coefficient of agreement with linear weights.Statistical Methodology, 6,439446.CrossRefGoogle Scholar
Nickerson, C. A. E. (1997). A note on A concordance correlation coefficient to evaluate reproducibility.Biometrics, 53,15031507.CrossRefGoogle Scholar
Schuster, C. (2004). A note on the interpretation of weighted kappa and its relations to other rater agreement statistics for metric scales.Educational and Psychological Measurement, 64,243253.CrossRefGoogle Scholar
Shoukri, M. M. (2011). Measures of interobserver agreement and reliability.2Boca Raton:CRC Press.Google Scholar
Spitzer, RL.,Cohen, J.,Fleiss, JL., &Endicott, J. (1967). Quantification of agreement in psychiatric diagnosis.Archives of General Psychiatry, 17,8387.CrossRefGoogle ScholarPubMed
Tang, W.,He, H., &Tu, X. M. (2012). Applied categorical and count data analysis.Boca Raton:CRC.CrossRefGoogle Scholar
Tinsley, H. E. A., &Weiss, D. J. (2000). Interrater reliability and agreement.Tinsley, H. E. A., &Brown, S. D. Handbook of applied multivariate statistics and mathematical modeling.San Diego, CA:Academic Press.95124.CrossRefGoogle Scholar
Upton, G., &Cook, I. (2014). A dictionary of statistics.3Oxford:Oxford University Press.Google Scholar
Vanbelle, S. (2016). A new interpretation of the weighted kappa coefficient.Psychometrika, 81,399410.CrossRefGoogle Scholar
Vanbelle, S., &Albert, A. (2009). A note on the linearly weighted kappa coefficient for ordinal scales.Statistical Methodology, 6,157163.CrossRefGoogle Scholar
Van Sweiten, J. C.,Koudstaal, P. J.,Visser, M. C.,Schouten, H. J. A., &van Gijn, J. (1988). Interobserver agreement for the assessment of handicap in stroke patients.Stroke, 19(5),604607.CrossRefGoogle Scholar
Von Eye, A., &Mun, EY. (2005). Analyzing rater agreement: Manifest variable methods.Mahwah:Lawrence Erlbaum Associates.Google Scholar
Warrens, MJ. (2011). Cohen’s linearly weighted kappa is a weighted average of 2×2\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$2\times 2$$\end{document} kappas.Psychometrika, 76,471486.CrossRefGoogle Scholar
Warrens, M. J. (2012). Some paradoxical results for the quadratically weighted kappa.Psychometrika, 77,315323.CrossRefGoogle Scholar
Warrens, M. J. (2012). Cohen’s linearly weighted kappa is a weighted average.Advances in Data Analysis and Classification, 6,6779.CrossRefGoogle Scholar
Warrens, M. J. (2013). Cohen’s weighted kappa with additive weights.Advances in Data Analysis and Classification, 7,4155.CrossRefGoogle Scholar
Warrens, M. J. (2014). Corrected Zegers-ten Berge coefficients are special cases of Cohen’s weighted kappa.Journal of Classification, 31,179193.CrossRefGoogle Scholar
Warrens, M. J. (2015). Five ways to look at Cohen’s kappa.Psychology Psychotherapy, 5,197.Google Scholar
Yang, J., &Chinchilli, V. M. (2011). Fixed-effects modeling of Cohen’s weighted kappa for bivariate multinomial data.Computational Statistics and Data Analysis, 55,10611070.CrossRefGoogle Scholar