Assessing the Reliability of Blind Wine Tasting: Differentiating Levels of Clinical and Statistical Meaningfulness*

Domenic V. Cicchetti

doi:10.1017/S1931436100000432

Assessing the Reliability of Blind Wine Tasting: Differentiating Levels of Clinical and Statistical Meaningfulness*

Published online by Cambridge University Press: 08 June 2012

Domenic V. Cicchetti

Show author details

Domenic V. Cicchetti: Affiliation:
Dom Cicchetti, Ph.D., Yale Home Office, 94 Linsley Lake Road, North Branford, CT. 06471; e-mail:[email protected].

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

The author distinguishes between the clinical and statistical meaning of varying levels of intertaster reliability for the 11 judges who evaluated 10 Chardonnays (6 American and 4 French) in the heralded 1976 Paris wine competition. Four wines showed levels of weighted kappa values (<0.40), that are considered poor by established biostatistical criteria. These ranged between 0.10, for the French Beaune Clos des Mouches 1973 Chardonnay to 0.33 for the U.S. Veedercrest 1972 Chardonnay. However, when levels of statistical significance of the weighted kappa (Kw) values were obtained, only the Clos des Mouches failed to reach statistical significance at the .05 level. The other three wines-the U.S. Chateau Montelena, 1973, with a Kw of 0.20; the U.S. 1973 David Bruce regular, with a weighted kappa value of .27 and the U.S. Veedercrest, with one of .33-reached statistical significance at p values of <.05, <.001, and <.0001, respectively. These findings are not weighted kappa specific, and reveal that when sample sizes are large enough, even the most trivial of results will be statistically significant, while often devoid of practical or clinical meaning-fulness. A level of Kw that is clinically meaningful will most likely be statistically significant. But high levels of statistical significance are no guarantee of clinical significance. Methods for resolving this “big N phenomenon” are presented and discussed. (JEL Classification: C12, C49)

Type: Articles
Information: Journal of Wine Economics , Volume 2 , Issue 2 , Fall 2007 , pp. 196 - 202

DOI: https://doi.org/10.1017/S1931436100000432 [Opens in a new window]
Copyright: Copyright © American Association of Wine Economists 2007

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Borenstein, M. (1998). The shift from significance testing to effect size estimation. In: Bellak, A.S. and Hersen, M. (Series Eds.) and Schooler, N. (Vol. Ed.), Research and Methods, Vol. 3, Comprehensive Clinical Psychology. New York, NY: Pergamon, 313–349.CrossRef Google Scholar

Borenstein, M., Rothstein, H., and Cohen, (2001). Power and Precision: A Computer Program for Statistical Power Analysis and Confidence Intervals. Englewood, NJ: Biostat, Inc.Google Scholar

Cicchetti, D.V. (2001). The precision of reliability and validity estimates re-visited: Distinguishing between clinical and statistical significance of sample size requirements. Journal of Clinical and Experimental Neuropsychology, 23, 695–700.CrossRef Google Scholar PubMed

Cicchetti, D.V. (2006). The Paris 1976 Wine tastings revisited once more: Comparing ratings of consistent and inconsistent tasters. Journal of Wine Economics, 2, 125–140.CrossRef Google Scholar

Cicchetti, D.V., Bronen, R., Spencer, S., Haut, S., Berg, A., Oliver, P., and Tyrer, P. (2006). Rating scales, scales of measurement, issues of reliability: Resolving some critical issues for clinicians and researchers. Journal of Nervous and Mental Disease, 194, 557–564.CrossRef Google Scholar PubMed

Cicchetti, D.V., Lord, C., Koenig, K., Klin, A. and Volkmar, F. (in press). Reliability of the ADI-R: Multiple examiners evaluate a single case. Journal of Autism and Developmental Disorders.Google Scholar

Cicchetti, D.V. and Rourke, B.P. (Eds), (2004). Methodological and biostatistical foundations of clinical neuropsychology and medical and health disciplines. (2nd Ed), London, England: Psychology Press, Taylor & Francis.Google Scholar

Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213–220.CrossRef Google Scholar PubMed

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences. 2nd edition, Mahwah, NJ: Lawrence Erlbaum.Google Scholar

Fleiss, J.L., Cohen, J., and Everitt, B.S. (1969). Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 72, 323–327.CrossRef Google Scholar

Fleiss, J.L., Levin, B., and Paik, M.C. (2003). Statistical Methods for Rates and Proportions. 3rd edition, New York, NY: John Wiley and Sons.CrossRef Google Scholar

Kaufman, A.S. (2001). Do low levels of lead produce IQ loss in children?: A careful examination of the literature. Archives of Clinical Neuropsychology, 16, 303–341.CrossRef Google Scholar PubMed

McCarthy, P.L., Cicchetti, D.V., Sznajderman, S.D., Forsyth, B.C., Baron, M.A., Fink, H.D., Czarkowski, N., Bauchner, H., and Lustman-Findling, K. (1991). Demographic, clinical and psychosocial predictors of the reliability of mothers' clinical judgments. Pediatrics, 88, 1041–1046.CrossRef Google Scholar PubMed

Article contents

Assessing the Reliability of Blind Wine Tasting: Differentiating Levels of Clinical and Statistical Meaningfulness*

Abstract

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests