Behavioristic, Evidentialist, and Learning Models of Statistical Testing

Deborah G. Mayo

doi:10.1086/289272

Behavioristic, Evidentialist, and Learning Models of Statistical Testing

Published online by Cambridge University Press: 01 April 2022

Deborah G. Mayo

Show author details

Deborah G. Mayo*: Affiliation:
Department of Philosophy Virginia Polytechnic Institute and State University

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

While orthodox (Neyman-Pearson) statistical tests enjoy widespread use in science, the philosophical controversy over their appropriateness for obtaining scientific knowledge remains unresolved. I shall suggest an explanation and a resolution of this controversy. The source of the controversy, I argue, is that orthodox tests are typically interpreted as rules for making optimal decisions as to how to behave–-where optimality is measured by the frequency of errors the test would commit in a long series of trials. Most philosophers of statistics, however, view the task of statistical methods as providing appropriate measures of the evidential-strength that data affords hypotheses. Since tests appropriate for the behavioral-decision task fail to provide measures of evidential-strength, philosophers of statistics claim the use of orthodox tests in science is misleading and unjustified. What critics of orthodox tests overlook, I argue, is that the primary function of statistical tests in science is neither to decide how to behave nor to assign measures of evidential strength to hypotheses. Rather, tests provide a tool for using incomplete data to learn about the process that generated it. This they do, I show, by providing a standard for distinguishing differences (between observed and hypothesized results) due to accidental or trivial errors from those due to systematic or substantively important discrepancies. I propose a reinterpretation of a commonly used orthodox test to make this learning model of tests explicit.

Type: Research Article
Information: Philosophy of Science , Volume 52 , Issue 4 , December 1985 , pp. 493 - 516

DOI: https://doi.org/10.1086/289272 [Opens in a new window]
Copyright: Copyright © 1985 by the Philosophy of Science Association

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

†

I am grateful to Ronald Giere, Norman Gilinsky, I. J. Good, Oscar Kempthorne, Henry Kyburg, and Larry Laudan for very helpful comments. I thank Jim Fetzer for first suggesting I spell out my (learning) model by contrasting it to the existing (behavioristic and evidentialist) models of statistical tests.

References

REFERENCES

Birnbaum, A. (1977), “The Neyman-Pearson Theory as Decision Theory, and as Inference Theory; With a Criticism of the Lindley-Savage Argument for Bayesian Theory”, Synthese 36: 19–50.CrossRef Google Scholar

Carnap, R. (1950), Logical Foundations of Probability. Chicago: University of Chicago Press.Google Scholar

Edwards, W.; Lindman, H.; and Savage, L. J. (1963), “Bayesian Statistical Inference for Psychological Research”, Psychological Review 70: 193–242.CrossRef Google Scholar

Fetzer, J. H. (1981), Scientific Knowledge. Dordrecht: D. Reidel.CrossRef Google Scholar

Fisher, R. A. (1955), “Statistical Methods and Scientific Induction”, Journal of the Royal Statistical Society B 17: 69–78.Google Scholar

Giere, R. N. (1969), “Bayesian Statistics and Biased Procedures”, Synthese 20: 371–87.CrossRef Google Scholar

Giere, R. N. (1976), “Empirical Probability, Objective Statistical Methods and Scientific Inquiry”, in Foundations of Probability Theory, Statistical Inference and Statistical Theories of Science, vol. 2, W. L. Harper and C. A. Hooker (eds.). Dordrecht: D. Reidel, pp. 63–101.CrossRef Google Scholar

Giere, R. N. (1977), “Testing vs. Information Models of Statistical Inference”, in Logic, Laws and Life, Colodny, R. G. (ed.). Pittsburgh: University of Pittsburgh Press, pp. 19–70.Google Scholar

Good, I. J. (1950), Probability and the Weighing of Evidence. London: Griffin; New York: Hafner.Google Scholar

Good, I. J. (1980), “The Diminishing Significance of a P-Value as the Sample Size Increases”, Journal of Statistical Computation and Simulation 11: 307–9.CrossRef Google Scholar

Good, I. J. (1981), “Some Logic and History of Hypothesis Testing”, in Philosophy in Economics, Pitt, J. C. (ed.), Dordrecht: D. Reidel, pp. 149–74.Google Scholar

Good, I. J. (1982), “Standardized Tail-Area Probabilities”, Journal of Statistical Computation and Simulation 13: 65–66.CrossRef Google Scholar

Hacking, I. (1965), Logic of Statistical Inference. Cambridge: Cambridge University Press.CrossRef Google Scholar

Hacking, I. (1980), “The Theory of Probable Inference: Neyman, Peirce and Braithwaite”, in Science, Belief and Behavior: Essays in Honour of R. B. Braithwaite, Mellor, D. H. (ed.). Cambridge: Cambridge University Press, pp. 141–60.Google Scholar

Jeffreys, H. [1938] (1961), Theory of Probability. Oxford: Clarendon Press.Google Scholar

Kempthorne, O. (1971), “Probability, Statistics, and the Knowledge Business”, in Foundations of Statistical Inference, Godambe, V. P. and Sprott, D. A. (eds.). Toronto: Holt, Rinehart and Winston of Canada, pp. 470–92.Google Scholar

Kempthorne, O., and Folks, L. (1971), Probability, Statistics, and Data Analysis. Ames: Iowa State University Press.Google Scholar

Kyburg, H. E., Jr, . (1971), “Probability and Informative Inference”, in Foundations of Statistical Inference, Godampe, V. P. and Sprott, D. A. (eds.). Toronto: Holt, Rinehart and Winston of Canada, pp. 82–103.Google Scholar

Kyburg, H. E., Jr, . (1974), The Logical Foundations of Statistical Inference. Dordrecht: D. Reidel.CrossRef Google Scholar

Levi, I. (1980), The Enterprise of Knowledge. Cambridge: The MIT Press.Google Scholar

Lindley, D. V. (1965), Introduction to Probability and Statistics From a Bayesian Point of View. Part 2: Inference. Cambridge: Cambridge University Press.CrossRef Google Scholar

Lindley, D. V. (1972), Bayesian Statistics, A Review. Philadelphia: Society for Industrial and Applied Mathematics.CrossRef Google Scholar

Mayo, D. (1981a), “In Defense of the Neyman-Pearson Theory of Confidence Intervals”, Philosophy of Science 48: 269–80.CrossRef Google Scholar

Mayo, D. (1981b), “Testing Statistical Testing”, in Philosophy of Economics, Pitt, J. C. (ed.). Dordrecht: D. Reidel, pp. 175–203.Google Scholar

Mayo, D. (1982), “On After-Trial Criticisms of Neyman-Pearson Theory of Statistics”, in PSA 1982, vol. 1, P. Asquith and T. Nickles (eds.). East Lansing: Philosophy of Science Association, pp. 145–58.Google Scholar

Mayo, D. (1983), “An Objective Theory of Statistical Testing”, Synthese 57: 297–340.CrossRef Google Scholar

Neyman, J., and Pearson, E. S. (1933), “On the Problem of the Most Efficient Tests of Statistical Hypothesis”, Philosophical Transactions of the Royal Society A 231: 289–337. (Reprinted in Joint Statistical Papers, Berkeley: University of California Press, 1967, pp. 276–83.)Google Scholar

Pearson, E. S. (1947), “The Choice of Statistical Tests Illustrated on the Interpretation of Data Classed in a 2 × 2 Table”, Biometrika 34: 139–67. (Reprinted in The Selected Papers of E. S. Pearson, Berkeley: University of California Press, pp. 169–97.)Google Scholar

Pearson, E. S. (1955), “Statistical Concepts in Their Relation to Reality”, Journal of the Royal Statistical Society B 17: 204–7.Google Scholar

Rosenkrantz, R. D. (1977), Inference, Method and Decision. Dordrecht: D. Reidel.CrossRef Google Scholar

Rosenthal, R., and Gaito, J. (1963), “The Interpretation of Levels of Significance by Psychological Researchers”, Journal of Psychology 55: 33–38.CrossRef Google Scholar

Seidenfeld, T. (1979), Philosophical Problems of Statistical Inference. Dordrecht: D. Reidel.Google Scholar

Smith, C. (1977), “The Analogy between Decision and Inference”, Synthese 36: 71–85.CrossRef Google Scholar

Spielman, S. (1973), “A Refutation of the Neyman-Pearson Theory of Testing”, British Journal for the Philosophy of Science 24: 201–22.CrossRef Google Scholar

Article contents

Behavioristic, Evidentialist, and Learning Models of Statistical Testing

Abstract

Access options

Article purchase

Temporarily unavailable

Footnotes

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests