Null Hypothesis Significance Testing, p-values, Effects Sizes and Confidence Intervals

Michael Perdices

doi:10.1017/BrImp.2017.28

Null Hypothesis Significance Testing, p-values, Effects Sizes and Confidence Intervals

Published online by Cambridge University Press: 07 December 2017

Michael Perdices

Show author details

Michael Perdices*: Affiliation:
Department Of Neurology, Royal North Shore Hospital, New South Wales, Australia
*: Address for correspondence: Department Of Neurology, Royal North Shore Hospital, The University of Sydney Medical School, Northern Clinical School, Discipline of Psychiatry, New South Wales, Australia. E-mail: [email protected]

Article contents

Abstract
References

Get access

Abstract

There has been controversy over Null Hypothesis Significance Testing (NHST) since the first quarter of the 20th century and misconceptions about it still abound. The first section of this paper briefly discusses some of the problems and limitations of NHST. Overwhelmingly, the ‘holy grail’ of researchers has been to obtain significant p-values. In 1999 the American Psychological Association (APA) recommended that if NHST was used in data analysis, then researchers should report effect sizes (ESs) and their confident intervals (CIs) as well as p-values. The APA recommendations are summarised in the next section of the paper. But as neuropsychological rehabilitation clinicians, the primary interest is (or should be) to determine whether or not the effect of an intervention is clinically important, not just statistically significant. In this context, ESs and their CIs provide information relevant to clinicians. The next section of the paper reviews common ESs and worked out examples are provided for the calculation of three commonly used ES (Cohen's d, Hedge's g and Glass’ delta). Web-based resources for calculating other ESs and their CIs are also reviewed.

Keywords

NHST Effect Size Confidence Interval p-values

Type: Articles
Information: Brain Impairment , Volume 19 , Special Issue 1: Quantitative Data Analysis; by Robyn Tate and Michael Perdices , March 2018 , pp. 70 - 80

DOI: https://doi.org/10.1017/BrImp.2017.28 [Opens in a new window]
Copyright: Copyright © Australasian Society for the Study of Brain Impairment 2017

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

APA Publications and Communications Board Working Group on Journal Article Reporting Standards (2008). Reporting standards for research in psychology: Why do we need them? What might they be?. American Psychologist, 63 (9), 839–851.Google Scholar

Bakeman, R. (2005). Recommended effect size statistics for repeated measures designs. Behavior Research Methods, 37 (3), 379–384.CrossRef Google Scholar PubMed

Berben, L., Sereika, S.M., & Engberg, S. (2012). Effect size estimation: Methods and examples. International Journal of Nursing Studies, 49, 1039–1047.CrossRef Google Scholar PubMed

Berkson, J. (1938). Some difficulties of interpretation encountered in application of Chi squared. Journal of the American Statistical Association, 33 (203), 526–536.Google Scholar

Carver, R.P. (1978). The case against statistical significance. Harvard Educational Review, 48 (3), 378–399.Google Scholar

Castro Sotos, A.E., Vanhoof, S., Van den Noortgate, W., & Onghena, P. (2007). Students' misconceptions of statistical inference: A review of the empirical evidence from research on statistics education. Educational Research Review, 2 (2), 98–113.Google Scholar

Clark, C.A. (1963). Hypothesis testing in relation to statistical methodology. Review of Educational Research, 33, 455–473.Google Scholar

Cohen, J. (1962). The statistical power of abnormal–social psychological research. Journal of Abnormal and Social Psychology, 65, 145–153.Google Scholar

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.Google Scholar

Cohen, J. (1990). Things I have learned (So far). American Psychologist, 45 (12), 1304–1312.Google Scholar

Cohen, J. (1994). The Earth is round (p < .5). American Psychologist, 49 (12), 997–1003.Google Scholar

Cooper, H., Hedges, L.V., & Valentine, J.C. (2009). The handbook of research and synthesis and meta-analysis. New York: Russell Sage Foundation.Google Scholar

Cumming, G. & Finch, S. (2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61 (4), 532–574.Google Scholar

Draper, S.W. (2016). Effect Size. Retrieved from http://www.psy.gla.ac.uk/~steve/best/effect.html Google Scholar

Ellis, P.D. (2010). The essential guide to effect sizes: Statistical power, meta-analysis, and the interpretation of research results. New York: Cambridge University Press.CrossRef Google Scholar

Falk, R. & Greenbaum, C.W. (1995). Significance tests die hard. The amazing persistence of a probabilistic misconception. Theory and Psychology, 5 (1), 76–98.Google Scholar

Ferguson, C.J. (2009). An effect size primer: A guide for clinicians and researchers. Professional Psychology: Research and Practice, 40 (5), 532–538.Google Scholar

Fethney, J. (2010). Statistical and clinical significance, and how to use confidence intervals to help interpret both. Australian Critical Care, 23, 93–97.Google Scholar

Fisher, R.A. (1925). Statistical methods for research workers. Edinburgh: Oliver and Boyd.Google Scholar

Gigerenzer, G. (1993). The superego, the ego, and the id in statistical reasoning. In Keren, G. & Lewis, C. (Eds.), A handbook for data analysis in the behavioral sciences. Methodological issues (pp. 311–339). Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar

Glaser, D.N. (1999). The controversy of significance testing: Misconceptions and alternatives. American Journal of Critical care, 8 (5), 291–296.Google Scholar

Glass, G.V., McGaw, B., & Smith, M.L. (1981). Meta-analysis in social research. Beverly Hills, CA: Sage Google Scholar

Gliner, J.A., Leech, N.L., & Morgan, G.A. (2002). Problems with null hypothesis significance testing (NHST): What do the textbooks say?. The Journal of Experimental Education, 71 (1), 83–92.Google Scholar

Halsey, L.G., Curran-Everett, D., Vowler, S.L., & Drummond, G.B. (2015). The fickle P value generates irreproducible results. Nature Methods, 12 (3), 179–185.Google Scholar

Hedges, L.V. (1981). Distribution theory for Glass's estimator of effect size and related estimators. Journal of Educational Statistics, 6 (2), 106–128.Google Scholar

Howell, D.C. (2010). Confidence intervals on effect size. Retrieved from: https://www.uvm.edu/~dhowell/methods7/Supplements/Confidence%20Intervals%20on%20Effect%20Size.pdf Google Scholar

Huberty, C.J. (2002). A history of effect size indices. Educational and Psychological Measurement, 62 (2), 227–240.CrossRef Google Scholar

Huberty, C.J., & Pike, C.J. (1999). On some history regarding statistical testing. Advances in Social Science Methodology, 5, 1–22.Google Scholar

Keselman, H.J., Huberty, C.J., Lix, L.M., Olejnik, S., Cribbie, R., Donahue, B., . . . Levin, J.R. (1998). Statistical practices of educational researchers: An analysis of their ANOVA, MANOVA, and ANCOVA analyses. Review of Educational Research, 68, 350–386.Google Scholar

Kirk, R.E. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56 (5), 746–759.Google Scholar

Kraemer, H.C., Morgan, G.A., Leech, N.L., Gliner, J.A., Vaske, J.J., & Harmon, R.J. (2003). Measures of clinical significance. Journal of the American Academy of Child and Adolescent Psychiatry, 42 (12), 1524–1529.Google Scholar

Krishnan, S. & Idris, N. (2014). Students’ misconceptions about hypothesis test. REDIMAT: Journal of Research in Mathematics Education, 3 (3), 276–293.Google Scholar

Lambdin, C. (2012). Significance tests as sorcery: Science is empirical-significance tests are not. Theory and Psychology, 22 (1), 67–90.Google Scholar

Li-Ting, C., & Chao-Ying, J.P. (2013). Constructing confidence intervals for effect sizes in ANOVA designs. Journal of Modern Applied Statistical Methods, 12 (2), 82–104.Google Scholar

Meehl, P.E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 34 (2), 103–115.Google Scholar

Meyer, G.J., McGrath, R.E., & Rosenthal, R. (2003 ). Basic effect size guide with SPSS® and SAS® syntax. Retrieved from www.tandf.co.uk/journals/authors/hjpa/resources/basiceffectsizeguide.rtf.Google Scholar

Neyman, J., & Pearson, E. (1928a). On the use and interpretation of certain test criteria for purposes of statistical inference: Part I. Biometrika, 20A, 175–240.Google Scholar

Neyman, J., & Pearson, E. (1928b). On the use and interpretation of certain test criteria for purposes of statistical inference: Part II. Biometrika, 20A, 263–294.Google Scholar

Nickerson, R.S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5 (2), 241–301.Google Scholar

Olejnik, S., & Algina, J. (2000). Measures of effect size for comparative studies: Applications, interpretations, and limitations. Contemporary Educational Psychology, 25, 241–286.Google Scholar

Peng, C.-Y. J., Chen, L.-T., Chiang, H.-M., & Chiang, Y.-C. (2013). The impact of APA and AERA guidelines on effect size reporting. Educational Psychology Review, 25, 157–209.CrossRef Google Scholar

Prentice, D.A., & Miller, D.T. (1992). When small effects are impressive. Psychological Bulletin, 112 (1), 160–164.Google Scholar

Rea, L.M., & Parker, R.A. (1992). Designing and conducting survey research. San Francisco: Jossey-Boss.Google Scholar

Richardson, J.T.E. (2011). Eta squared and partial eta squared as measures of effect size in educational research. Educational Research Review, 6 (12), 135–147.Google Scholar

Rozeboom, W.W. (1960). The fallacy of the null-hypothesis significance test. Psychological Bulletin, 57 (5), 416–428.Google Scholar

Sainani, K.L. (2012). Clinical versus statistical significance. American Academy of Physical Medicine and Rehabilitation, 4 (6), 442–445.Google Scholar PubMed

Schatz, P., Jay, K.A., McComb, J., & McLaughlin, J.R. (2005). Misuse of statistical tests in archives of clinical neuropsychology publications. Archives of Clinical Neuropsychology, 20, 1053–1059.CrossRef Google Scholar PubMed

Smithson, M. (2001). Correct confidence intervals for various regression effect sizes and parameters: The importance of noncentral distributions in computing intervals. Educational and Psychological Measurement, 61 (4), 605–632.Google Scholar

Thompson, B. (2002). “Statistical,” “Practical,” and “Clinical”: How many kinds of significance do counselors need to consider?. Journal of Counselling and Development, 80, 64–71.Google Scholar

Torciano, M. (2017 ) Efficient effect size computation. Retrieved from https://cran.r-project.org/web/packages/effsize/effsize.pdf Google Scholar

Turner, H.M., & Bernard, R.M. (2006). Calculating and synthesizing effect sizes. Contemporary Issues in Communication Science and Disorders, 33, 42–55.Google Scholar

Vallecillos, A. (2001). Cuestiones metodológicas en la investigación educativa. Quinto Simposio de la Sociedad Española de Investigación en Educación Matemática, Almería, Spain.Google Scholar

Vallecillos, A., & Batanero, C. (1997b). Conceptos activados en el contraste de hipótesis estadísticas y su comprensión por estudiantes universitarios. Recherches en Didactique des Mathématiques, 17 (1), 29–48.Google Scholar

Vallecillos, A., & Batanero, M.C. (1997a). Aprendizaje y enseñanza del contraste de hipotesis: Concepciones y errores. Enseñanza de las Ciencias, 15 (2), 189–197.Google Scholar

Wilkinson, L. and the Task Force on Statistical Inference APA Board of Scientific Affairs. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54 (8), 594–604.Google Scholar

Wilson, D.B. (2011). Interpretation.ppt. Retrieved from http://mason.gmu.edu/~dwilsonb/ma.html.Google Scholar

Article contents

Null Hypothesis Significance Testing, p-values, Effects Sizes and Confidence Intervals

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests