Hostname: page-component-745bb68f8f-b95js Total loading time: 0 Render date: 2025-01-08T06:33:28.276Z Has data issue: false hasContentIssue false

Dispersion-Weighted Kappa: An Integrative Framework for Metric and Nominal Scale Agreement Coefficients

Published online by Cambridge University Press:  01 January 2025

Christof Schuster*
Affiliation:
Justus-Liebig-Unversität Giessen
David A. Smith
Affiliation:
University of Notre Dame
*
Requests for reprints should be sent to Christof Schuster, Fachbereich Psychologie und Sport Wissenschaft, Justus-Liebig-Universität Giessen, Otta-Behaghel-Str. 10, 35394 Giessen, Germany. E-mail: [email protected]

Abstract

The rater agreement literature is complicated by the fact that it must accommodate at east two different properties of rating data: the number of raters (two versus more than two) and the rating scale level (nominal versus metric). While kappa statistics are most widely used for nominal scales, intraclass correlation coefficients have been preferred for metric scales. In this paper, we suggest a dispersion-weighted kappa framework for multiple raters that integrates some important agreement statistics by using familiar dispersion indices as weights for expressing disagreement. These weights are applied to ratings identifying cells in the traditional inter-judge contingency table. Novel agreement statistics can be obtained by applying less familiar indices of dispersion in the same way.

Type
Original Paper
Copyright
Copyright © 2005 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agresti, A., & Agresti, B.F. (1978). Statistical analysis of qualitative variation. Sociological Methodology, 9, 204237.CrossRefGoogle Scholar
Alker, H.R. (1972). Measuring sociopolitical inequality. In Tanur, J.M., Mosteller, F., Kruskal, W.H., Link, R.S., Pieters, R.S., & Rising, G.R. (Eds.), Statistics: A Guide to the Unknown (pp. 343351). San Francisco, CA: Holden-Day.Google Scholar
Becker, G. (2000). Creating comparability among reliability coefficients: The case of Cronbach alpha and Cohen kappa. Psychological Reports, 87, 11711182.CrossRefGoogle ScholarPubMed
Cohen, J. (1960). A coefficient of agreement for nominal tables. Educational and Psychological Measurement, 20, 3746.CrossRefGoogle Scholar
Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213220.CrossRefGoogle ScholarPubMed
Conger, A.J. (1980). Integration and generalization of kappas for multiple raters. Psychological Bulletin, 88, 322328.CrossRefGoogle Scholar
Dillon, W.R., & Mulani, N. (1984). A probabilistic latent class model for assessing inter-judge reliability. Multivariate Behavioral Research, 19, 438458.CrossRefGoogle ScholarPubMed
Dunn, G. (1989). Design and Analysis of Reliability Studies. London: Edward Arnold.Google Scholar
Everitt, B.S. (1998). Dictionary of Statistics. Cambridge: Cambridge University Press.Google Scholar
Fagot, R.F. (1993). generalized family of coefficients of relational agreement for numerical scales. Psychometrika, 58(2), 357370.CrossRefGoogle Scholar
Fleiss, J.L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76, 378382.CrossRefGoogle Scholar
Fleiss, J.L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33, 613619.CrossRefGoogle Scholar
Garner, W.R., & Hake, H.W. (1951). The amount of information in absolute judgments. Psychological Review, 58, 446459.CrossRefGoogle Scholar
Garner, W.R., & McGill, W.J. (1956). The relation between information and variance analyses. Psychometrika, 21(3), 219228.CrossRefGoogle Scholar
Gilula, Z., & Haberman, S.J. (1995). Dispersion of categorical variables and penalty functions: derivation estimation, and comparability. Journal of the American Statistical Association, 90, 14471452.CrossRefGoogle Scholar
Gini, C. (1912). Variabilit? e mutabilit?: Contributo allo studio delle distribuzioni e delle relazioni statistiche. Bologna: Cuppini.Google Scholar
Gonin, R., Lipsitz, S.R., Fitzmaurice, G.M., & Molenberghs, G. (2000). Regression modelling of weighted κ by using generalized estimation equations. Applied Statistics, 49(1), 118.Google Scholar
Haberman, S.J. (1982). Analysis of dispersion of multinomial responses. Journal of the American Statistical Association, 77, 568580.CrossRefGoogle Scholar
Hays, W.L. (1994). Statistics (5th ed.). Wadsworth: Belmont, CA.Google Scholar
Hill, M.O. (1973). Diversity and eveness: A unifying notation and its consequences. Ecology, 54(2), 427432.CrossRefGoogle Scholar
Hubert, L. (1977). Kappa revisited. Psychological Bulletin, 84(2), 289297.CrossRefGoogle Scholar
Janson, H., & Olsson, U. (2001). A measure of agreement for interval or nominal multivariate observations. Educational and Psychological Measurement, 61(2), 277289.CrossRefGoogle Scholar
Kendall, M.G., & Stuart, A. (1963). The Advanced Theroy of Statistics (vol. 1, 2nd ed.). New York: Hafner.Google Scholar
Klar, N., & Lipsitz, S.R. (2000). An estimating equations approach for modelling kappa. Biometrical Journal, 42(1), 4558.3.0.CO;2-#>CrossRefGoogle Scholar
Klauer, K.C. (1996). Urteilerübereinstimmung bei dichotomen kategoriensystemen. Diagnostica, 42(2), 101118.Google Scholar
Landis, J.R., & Koch, G.G. (1975). A review of statistical methods in the analysis of data arising from observer reliability studies (Part I). Statistica Neerlandica, 29, 101123.CrossRefGoogle Scholar
Landis, J.R., & Koch, G.G. (1975). A review of statistical methods in the analysis of data arising from observer reliability studies (Part II). Statistica Neerlandica, 29, 151161.CrossRefGoogle Scholar
Landis, J.R., & Koch, G.G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159174.CrossRefGoogle ScholarPubMed
Liang, K.-Y., & Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 1322.CrossRefGoogle Scholar
Liang, K.-Y., & Zeger, S.L. (1995). Inference based on estimating functions in the presence of nuisance parameters. Statistical Science, 10(2), 158173.Google Scholar
Lipsitz, S.R., Kim, K., & Zhao, L. (1994). Analysis of repeated categorical data using generalized estimating equations. Statistics in Medicine, 13, 11491163.CrossRefGoogle ScholarPubMed
Rae, G. (1984). On measuring agreement among several judges on the presence or absence of a trait. Educational and Psychological Measurement, 44, 247253.CrossRefGoogle Scholar
Rae, G. (1988). The equivalence of multiple rater kappa statistics and intraclass correlation coefficients. Educational and Psychological Measurement, 48(2), 367374.CrossRefGoogle Scholar
Rajaratnam, N. (1960). Reliability formulas for independent decision data when reliability data are matched. Psychometrika, 25(3), 261271.CrossRefGoogle Scholar
Renyi, A. (1961) On measures of entropy and information. In Neyman, J. (Ed.), 4th Berkely Symposium on Mathematical Statistics and Probability (pp. 547561). Berkeley CAGoogle Scholar
Shrout, P.E., & Fleiss, J.L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420428.CrossRefGoogle ScholarPubMed
Shrout, P.E., Spitzer, R.L., & Fleiss, J.L. (1987). Quantification of agreement in psychiatric diagnosis revisited. Archives of General Psychiatry, 44, 172177.CrossRefGoogle ScholarPubMed
Stokes, M.E., Davis, C.S., & Koch, G.G. (1995). Categorical Data Analysis using the SAS System. Cary NC: SAS Institute Inc.Google Scholar
Teachman, J.D. (1980). Analysis of population diversity. Sociological Methods Research, 8(3), 341362.CrossRefGoogle Scholar
Thompson, J.R. (2001). Estimating equations for kappa statistics. Statistics in Medicine, 20, 28952906.CrossRefGoogle ScholarPubMed
Winer, B.J. (1971). Statistical Principles in Experimental Design (2nd ed.). New York: McGraw-Hill.Google Scholar
Zwick, R. (1988). Another look at interrater agreement. Psychological Bulletin, 103(3), 374378.CrossRefGoogle Scholar