Hostname: page-component-745bb68f8f-g4j75 Total loading time: 0 Render date: 2025-01-07T18:03:47.407Z Has data issue: false hasContentIssue false

General Estimators for the Reliability of Qualitative Data

Published online by Cambridge University Press:  01 January 2025

Bruce Cooil*
Affiliation:
Owen Graduate School of Management, Vanderbilt University
Roland T. Rust
Affiliation:
Owen Graduate School of Management, Vanderbilt University
*
Requests for reprints should be sent to Bruce Cooil, Owen Graduate School of Management, Vanderbilt University, 401 21st Avenue South, Nashville, TN 37203. E-mail: [email protected]

Abstract

We study a proportional reduction in loss (PRL) measure for the reliability of categorical data and consider the general case in which each of N judges assigns a subject to one of K categories. This measure has been shown to be equivalent to a measure proposed by Perreault and Leigh for a special case when there are two equally competent judges, and the correct category has a uniform prior distribution. We consider a general framework where the correct category is assumed to have an arbitrary prior distribution, and where classification probabilities vary by correct category, judge, and category of classification. In this setting, we consider PRL reliability measures based on two estimators of the correct category—the empirical Bayes estimator and an estimator based on the judges' consensus choice. We also discuss four important special cases of the general model and study several types of lower bounds for PRL reliability.

Type
Original Paper
Copyright
Copyright © 1995 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Bruce Cooil is Associate Professor of Statistics, and Roland T. Rust is Professor and area head for Marketing, Owen Graduate School of Management, Vanderbilt University. The authors thank three anonymous reviewers and an Associate Editor for their helpful comments and suggestions. This work was supported in part by the Dean's Fund for Faculty Research of the Owen Graduate School of Management, Vanderbilt University.

References

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Petrov, B. N., Csáki, F. (Eds.), 2nd International Symposium on Information Theory (pp. 267281). Budapest: Akadémiai Kiadó.Google Scholar
Agresti, A. (1990). Categorical data analysis, New York: John Wiley & Sons.Google Scholar
Batchelder, W. H., Romney, A. K. (1986). The statistical analysis of a general Condorcet model for dichotomous choice situations. In Grofman, B., Owen, G. (Eds.), Information pooling and group decision making (pp. 103112). Greenwich, CN: JAI Press.Google Scholar
Batchelder, W. H., Romney, A. K. (1988). Test theory without an answer key. Psychometrika, 53, 193224.CrossRefGoogle Scholar
Batchelder, W. H., Romney, A. K. (1989). New results in test theory without an answer key. In Roskam, Edward E. (Eds.), Mathematical psychology in progress (pp. 229248). Berlin, Heidelberg, New York: Springer-Verlag.CrossRefGoogle Scholar
Clogg, C. C. (1981). New developments in latent structure analysis. In Jackson, D. M., Borgatta, E. F. (Eds.), Factor analysis and measurement in sociological research (pp. 215246). London: Sage.Google Scholar
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 3746.CrossRefGoogle Scholar
Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213220.CrossRefGoogle ScholarPubMed
Cooil, B., Rust, R. T. (1994). Reliability and expected loss: A unifying principle. Psychometrika, 59, 203216.CrossRefGoogle Scholar
Costner, H. L. (1965). Criteria for measures of association. American Sociological Review, 30, 341353.CrossRefGoogle Scholar
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297334.CrossRefGoogle Scholar
Cronbach, L. J., Gleser, G. C., Nanda, H., Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles, New York: John Wiley & Sons.Google Scholar
David, F. N., Barton, D. E. (1962). Combinatorial chance, London: Griffin.CrossRefGoogle Scholar
David, H. A. (1981). Order statistics 2nd ed.,, New York: John Wiley & Sons.Google Scholar
Dillon, W. R., Mulani, N. (1984). A probabilistic latent class model for assessing inter-judge reliability. Multivariate Behavioral Research, 19, 438458.CrossRefGoogle ScholarPubMed
Haberman, S. J. (1974). Log-linear models for frequency tables derived by indirect observation. Maximum likelihood equations. Annals of Statistics, 2, 911924.CrossRefGoogle Scholar
Hughes, M. A., Garrett, D. E. (1990). Intercoder reliability estimation approaches in marketing: A generalizability theory framework for quantitative data. Journal of Marketing Research, 27, 185195.CrossRefGoogle Scholar
Johnson, N. L., Kotz, S. (1969). Discrete distributions, Boston, MA: Houghton Mifflin.Google Scholar
Kesten, H., Morse, N. (1959). A property of the multinomial distribution. Annals of Mathematical Statistics, 30, 120127.CrossRefGoogle Scholar
Kozelka, R. M. (1956). Approximate upper percentage points for extreme values in multinomial sampling. Annals of Mathematical Statistics, 27, 507512.CrossRefGoogle Scholar
Loevinger, J. (1948). The technic of homogeneous tests compared with some aspects of “scale analysis” and factor analysis. Psychological Bulletin, 45, 507530.CrossRefGoogle ScholarPubMed
Marshall, A. W., Olkin, I. (1979). Inequalities: Theory of majorization and its applications, New York: Academic Press.Google Scholar
Mellenbergh, G. J., van der Linden, W. J. (1979). The internal and external optimality of decisions based on tests. Applied Psychological Measurement, 3, 257273.CrossRefGoogle Scholar
Perreault, W. D. Jr., Leigh, L. E. (1989). Reliability of nominal data based on qualitative judgments. Journal of Marketing Research, 26, 135148.CrossRefGoogle Scholar
Romney, A. K., Weller, S. C., Batchelder, W. H. (1986). Culture as consensus: A theory of culture and informant accuracy. American Anthropologist, 88, 313338.CrossRefGoogle Scholar
Rust, R. T., Simester, D., Brodie, R. J., & Nilikant, V. (in press). Model selection criteria: An investigation of relative accuracy, posterior probabilities, and combinations of criteria. Management Science.Google Scholar
Schouten, H. J. A. (1982). Measuring pairwise agreement among many observers, II: Some improvements and additions. Biometrical Journal, 24, 431435.CrossRefGoogle Scholar
Schouten, H. J. A. (1986). Nominal scale agreement among observers. Psychometrika, 51, 453466.CrossRefGoogle Scholar
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461464.CrossRefGoogle Scholar
White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica, 50, 125.CrossRefGoogle Scholar
Winer, B. J. (1971). Statistical principles in experimental design, New York: McGraw-Hill.Google Scholar
Woodroofe, M. (1982). On model selection and the arc sine laws. Annals of Statistics, 10, 11821194.CrossRefGoogle Scholar