Skip to main content Accessibility help
×
Hostname: page-component-cd9895bd7-dk4vv Total loading time: 0 Render date: 2024-12-26T08:03:07.274Z Has data issue: false hasContentIssue false

36 - Model Validation, Comparison, and Selection

from Part V - General Discussion

Published online by Cambridge University Press:  21 April 2023

Ron Sun
Affiliation:
Rensselaer Polytechnic Institute, New York
Get access

Summary

Progress in the computational cognitive sciences depends critically on model evaluation. This chapter provides an accessible description of key considerations and methods important in model evaluation, with special emphasis on evaluation in the forms of validation, comparison, and selection. Major sub-topics include qualitative and quantitative validation, parameter estimation, cross-validation, goodness of fit, and model mimicry. The chapter includes definitions of an assortment of key concepts, relevant equations, and descriptions of best practices and important considerations in the use of these model evaluation methods. The chapter concludes with important high-level considerations regarding emerging directions and opportunities for continuing improvement in model evaluation.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2023

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Petrov, B. N. & Csáki, F. (Eds.), 2nd International Symposium on Information Theory (pp. 267281). Budapest: Akadémiai Kiadó.Google Scholar
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19 (6), 716723.Google Scholar
Anderson, J. R. (1990). The Adaptive Character of Thought. Hillsdale, NJ: Erlbaum.Google Scholar
Anderson, J. R. (2007). How Can the Human Mind Occur in the Physical Universe? New York, NY: Oxford University Press.Google Scholar
Ashby, F. G., & Townsend, J. T. (1980). Decomposing the reaction time distribution: pure insertion and selective influence revisited. Journal of Mathematical Psychology, 21 (2), 93123.Google Scholar
Bakan, D. (1966). The test of significance in psychological research. Psychological Bulletin, 66 (6), 423437.Google Scholar
Bamber, D., & Van Santen, J. P. (1985). How many parameters can a model have and still be testable? Journal of Mathematical Psychology, 29 (4), 443473.CrossRefGoogle Scholar
Bamber, D., & Van Santen, J. P. (2000). How to assess a model’s testability and identifiability. Journal of Mathematical Psychology, 44 (1), 2040.Google Scholar
Blaha, L. M. (2019). We have not looked at our results until we have displayed them effectively: a comment on robust modeling in cognitive science. Computational Brain & Behavior, 2 (3), 247250.Google Scholar
Blaha, L. M., Fisher, C. R., Walsh, M. M., Veksler, B. Z., & Gunzelmann, G. (2016) Real-time fatigue monitoring with computational cognitive models. In Proceedings of Human-Computer Interaction International 2016, Toronto, Canada.Google Scholar
Blokpoel, M. & van Rooij, I. (2021). Theoretical modeling for cognitive science and psychology. Retrieved from: https://computationalcognitivescience.github.io/lovelace/home [last accessed August 2, 2022].Google Scholar
Bozdogan, H. (1990). On the information-based measure of covariance complexity and its application to the evaluation of multivariate linear models. Communications in Statistics – Theory and Methods, 19 (1), 221278.Google Scholar
Bozdogan, H. (2000). Akaike’s information criterion and recent developments in information complexity. Journal of Mathematical Psychology, 44 (1), 6291.Google Scholar
Broomell, S. B., Budescu, D. V., & Por, H.-H. (2011). Pair-wise comparisons of multiple models. Judgment and Decision Making, 6 (8), 821831.Google Scholar
Broomell, S. B., Sloman, S. J., Blaha, L. M., & Chelen, J. (2019). Interpreting model comparison requires understanding model-stimulus relationships. Computational Brain & Behavior, 2 (3), 233238.Google Scholar
Burnham, K. P., & Anderson, D. R. (2002) Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (2nd ed.). New York, NY: Springer Verlag.Google Scholar
Busemeyer, J. R., & Diederich, A. (2010). Cognitive Modeling. Los Angeles, CA: Sage.Google Scholar
Campbell, G. E., & Bolton, A. E. (2005). HBR validation: integrating lessons learned from multiple academic disciplines, applied communities, and the AMBR project. In Gluck, K. A. & Pew, R. W. (Eds.), Modeling Human Behavior with Integrated Cognitive Architectures: Comparison, Evaluation, and Validation (pp. 365395), Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
Chechile, R. A. (2010). A novel Bayesian parameter mapping method for estimating the parameters of an underlying scientific model. Communications in Statistics – Theory and Methods, 39 , 11901201.Google Scholar
Cohen, A. L., Sanborn, A. N., & Shiffrin, R. M. (2008). Model evaluation using grouped or individual data. Psychonomic Bulletin & Review, 15 (4), 692712.Google Scholar
Colonius, H., & Vorberg, D. (1994). Distribution inequalities for parallel models with unlimited capacity. Journal of Mathematical Psychology, 38, 3558.Google Scholar
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281302.Google Scholar
Dawid, A. P. (1984). Statistical theory: the prequential approach. Journal of the Royal Statistical Society A, 147, 278292.Google Scholar
Devezer, B., Navarro, D. J., Vandekerckhove, J., & Buzbas, E. O. (2020). The case for formal methodology in scientific reform. Royal Society Open Science, 8 (3), 200805.Google Scholar
Dutton, J. M., & Starbuck, W. H. (1971). Computer Simulation of Human Behavior. New York, NY: Wiley.Google Scholar
Dzhafarov, E. N. (2003). Selective influence through conditional independence. Psychometrika, 68 (1), 725.CrossRefGoogle Scholar
Dzhafarov, E. N., Schweickert, R., & Sung, K. (2004). Mental architectures with selectively influenced but stochastically interdependent components. Journal of Mathematical Psychology, 48 (1), 5164.Google Scholar
Erev, I., Ert, E., Roth, A. E., et al. (2010). A choice prediction competition: choices from experience and from description. Journal of Behavioral Decision Making, 23 (1), 1547.Google Scholar
Estes, W. K. (2002). Traps in the route to models of memory and decision. Psychonomic Bulletin & Review, 9 (1), 325.CrossRefGoogle ScholarPubMed
Farrell, S., & Lewandowsky, S. (2018). Computational Modeling of Cognition and Behavior. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Fisher, C. R., Houpt, J. W., & Gunzelmann, G. (2020). Developing memory-based models of ACT-R within a statistical framework. Journal of Mathematical Psychology, 98, 102416.Google Scholar
Fum, D., Del Missier, F., & Stocco, A. (2007). The cognitive modeling of human behavior: why a model is (sometimes) better than 10,000 words. Cognitive Systems Research, 8, 135142.Google Scholar
Gallant, A. R. (1987). Nonlinear Statistical Models. New York, NY: Wiley.CrossRefGoogle Scholar
Geisser, S. (1975). The predictive sample reuse method with applications. Journal of the American Statistical Association, 70 (350), 320328.CrossRefGoogle Scholar
Gluck, K. A., Bello, P., & Busemeyer, J. (2008). Introduction to the special issue. Cognitive Science, 32, 12451247.CrossRefGoogle ScholarPubMed
Gluck, K. A., & Pew, R. W. (2005). Modeling Human Behavior with Integrated Cognitive Architectures: Comparison, Evaluation, and Validation. Mahwah, NJ: Erlbaum.Google Scholar
Gluck, K. A., Stanley, C. T., Moore, L. R., Reitter, D., & Halbrügge, M. (2010). Exploration for understanding in cognitive modeling. Journal of Artificial General Intelligence, 2 (2), 88107.Google Scholar
Gronau, Q. F., & Wagenmakers, E. J. (2019). Limitations of Bayesian leave-one-out cross-validation for model selection. Computational Brain & Behavior, 2 (1), 111.Google Scholar
Grünwald, P. (2000). Model selection based on minimum description length. Journal of Mathematical Psychology, 44 (1), 133152.Google Scholar
Gunzelmann, G. (2019). Promoting cumulation in models of the human mind. Computational Brain & Behavior, 2 (34), 157159.Google Scholar
Harding, B., Goulet, M. A., Jolin, S., Tremblay, C., Villeneuve, S. P., & Durand, G. (2016). Systems factorial technology explained to humans. Tutorials in Quantitative Methods for Psychology, 12 (1), 3956.Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York, NY: Springer.Google Scholar
Hough, A. R., & Gluck, K. A. (2019). The understanding problem in cognitive science. Advances in Cognitive Systems, 8, 1332.Google Scholar
Houpt, J. W., Blaha, L. M., McIntire, J. P., Havig, P. R., & Townsend, J. T. (2014). Systems factorial technology with R. Behavior Research Methods, 46 (2), 307330.Google Scholar
Jeffreys, H. (1961). Theory of Probability (3rd ed.). Oxford: Oxford University Press.Google Scholar
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90 (430), 773795.Google Scholar
Kieras, D. E., & Meyer, D. E. (1997). An overview of the EPIC architecture for cognition and performance with application to human–computer interaction. Human–Computer Interaction, 12 (4), 391438.Google Scholar
Kim, W., Pitt, M. A., Lu, Z. L., Steyvers, M., & Myung, J. I. (2014). A hierarchical adaptive approach to optimal experimental design. Neural Computation, 26(11), 24652492.Google Scholar
Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220 (4598), 671680.Google Scholar
Kujala, J. V., & Dzhafarov, E. N. (2008). Testing for selectivity in the dependence of random variables on external factors. Journal of Mathematical Psychology, 52 (2), 128144.Google Scholar
Laird, J. E. (2012). The SOAR Cognitive Architecture. Cambridge, MA: MIT Press.CrossRefGoogle Scholar
Laird, J. E., Lebiere, C., & Rosenbloom, P. S. (2017). A standard model of the mind: toward a common computational framework across artificial intelligence, cognitive science, neuroscience, and robotics. AI Magazine, 38 (4), 1326.Google Scholar
Lebiere, C., Gonzalez, C., & Warwick, W. (2010). Editorial: cognitive architectures, model comparison, and AGI. Journal of Artificial General Intelligence, 2 (2), 119.Google Scholar
Lee, M. D., Criss, A. H., Devezer, B., et al. (2019). Robust modeling in cognitive science. Computational Brain & Behavior, 2, 141153.Google Scholar
Little, D., Altieri, N., Fific, M., & Yang, C. T. (Eds.). (2017). Systems Factorial Technology: A Theory Driven Methodology for the Identification of Perceptual and Cognitive Mechanisms. New York, NY: Academic Press.Google Scholar
Macmillan, N. A., & Creelman, C. D. (2005). Detection Theory: A User’s Guide (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
McClelland, J. L. (2009). The place of modeling in cognitive science. Topics in Cognitive Science, 1 (1), 1138.Google Scholar
Miller, J. (1982). Divided attention: evidence for coactivation with redundant signals. Cognitive Psychology, 14, 247279.CrossRefGoogle ScholarPubMed
Mosier, C. I. (1947). A critical examination of the concepts of face validity. Educational and Psychological Measurement, 7, 191205.Google Scholar
Myung, I. J., Balasubramanian, V., & Pitt, M. A. (2000). Counting probability distributions: differential geometry and model selection. Proceedings of the National Academy of Sciences, 97 (21), 1117011175.Google Scholar
Myung, I. J., Kim, C., & Pitt, M. A. (2000). Toward an explanation of the power law artifact: insights from response surface analysis. Memory & Cognition, 28 (5), 832840.Google Scholar
Myung, I. J., Navarro, D. J., & Pitt, M. A. (2006). Model selection by normalized maximum likelihood. Journal of Mathematical Psychology, 50 , 167179.Google Scholar
Myung, J. I., & Pitt, M. A. (2009). Optimal experimental design for model discrimination. Psychological Review, 116 (3), 499518.Google Scholar
Navarro, D. J. (2019). Between the devil and the deep blue sea: tensions between scientific judgement and statistical model selection. Computational Brain & Behavior, 2 (1), 2834.Google Scholar
Navarro, D. J. (2021). If mathematical psychology did not exist we might need to invent it: a comment on theory building in psychology. Perspectives on Psychological Science, 16 (4), 707716.Google Scholar
Navarro, D. J., Pitt, M. A., & Myung, I. J. (2004). Assessing the distinguishability of models and the informativeness of data. Cognitive Psychology, 49 (1), 4784.Google Scholar
Nelder, J. A., & Mead, R. (1965). A simplex method for function minimization. Computer Journal, 7 , 308313.Google Scholar
Newell, A., Shaw, J. C., & Simon, H. A. (1958). Elements of a theory of human problem solving. Psychological Review, 65 (3), 151166.CrossRefGoogle Scholar
Peressini, A. L., Sullivan, F. E., & Uhl Jr., J. J. (1988). The Mathematics of Nonlinear Programming. New York, NY: Springer-Verlag.Google Scholar
Pitt, M. A., Kim, W., Navarro, D. J., & Myung, J. I. (2006). Global model analysis by parameter space partitioning. Psychological Review, 113 (1), 5783.Google Scholar
Pitt, M. A., & Myung, I. J. (2002). When a good fit can be bad. Trends in Cognitive Sciences, 6 (10), 421425.Google Scholar
Pitt, M. A., Myung, I. J., Montenegro, M., & Pooley, J. (2008). Measuring model flexibility with parameter space partitioning: an introduction and application example. Cognitive Science, 32, 12851303.Google Scholar
Pitt, M. A., Myung, I. J., & Zhang, S. (2002). Toward a method of selecting among computational models of cognition. Psychological Review, 109 (3), 472491.Google Scholar
Rissanen, J. J. (1996). Fisher information and stochastic complexity. IEEE Transactions on Information Theory, 42 (1), 4047.CrossRefGoogle Scholar
Rissanen, J. J. (2001). Strong optimality of the normalized ML models as universal codes and information in data. IEEE Transactions on Information Theory, 47 , 17121717.Google Scholar
Roach, P. J. (2009). Fundamentals of Validation and Verification. Soccorro, NM: Hermosa Publishers.Google Scholar
Roberts, S., & Pashler, H. (2000). How persuasive is a good fit? A comment on theory testing. Psychological Review, 107 (2), 358367.Google Scholar
Rodgers, J. L., & Rowe, D. C. (2002). Theory development should begin (but not end) with good empirical fits: a comment on Roberts and Pashler (2000). Psychological Review, 109 (3), 599603.Google Scholar
Rosenbloom, P. S. (2013). On Computing: The Fourth Great Scientific Domain. Cambridge, MA: MIT Press.Google Scholar
Schunn, C. D., & Wallach, D. (2005). Evaluating goodness-of-fit in comparison of models to data. In Tack, W. (Ed.), Psychologie der Kognition: Reden und Vorträge anlässlich der Emeritierung von Werner Tack (pp. 115154). Saarbruken: University of Saarland Press.Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6 (2), 461464.Google Scholar
Shiffrin, R. M., Lee, M. D., Kim, W., & Wagenmakers, E. J. (2008). A survey of model evaluation approaches with a tutorial on hierarchical Bayesian methods. Cognitive Science, 32 (8), 12481284.Google Scholar
Simon, H. A. (1992). What is an “explanation” of behavior? Psychological Science, 3 (3), 150161.Google Scholar
Simon, H. A. (1996). Models of My Life. Cambridge, MA: MIT Press.Google Scholar
Slaney, K. (2017). Validating Psychological Constructs: Historical, Philosophical, and Practical Dimensions. London: Palgrave Macmillan.Google Scholar
Smaldino, P. (2019). Better methods can’t make up for mediocre theory. Nature, 575 (7781), 910.Google Scholar
Stewart, T. (2006). Tools and techniques for quantitative and predictive cognitive science. In Sun, R. & Miyake, N. (Eds.), Proceedings of the 28th Annual Meeting of the Cognitive Science Society (pp. 816821). Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
Stokes, D. E. (1997). Pasteur’s Quadrant: Basic Science and Technological Innovation. Washington, DC: Brookings Institution Press.Google Scholar
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society: Series B (Methodological), 36 (2), 111133.Google Scholar
Stone, M. (1977). An asymptotic equivalence of choice of model by cross‐validation and Akaike’s criterion. Journal of the Royal Statistical Society: Series B (Methodological), 39 (1), 4447.Google Scholar
Sun, R. (2016). Anatomy of the Mind: Exploring Psychological Mechanisms and Processes with the Clarion Cognitive Architecture. Oxford: Oxford University Press.Google Scholar
Thomas, R. D. (2001). Perceptual interactions of facial dimensions in speeded classification and identification. Perception & Psychophysics, 63 (4), 625650.Google Scholar
Townsend, J. T. (1990). Serial vs. parallel processing: sometimes they look like Tweedledum and Tweedledee but they can (and should) be distinguished. Psychological Science, 1 (1), 4654.CrossRefGoogle Scholar
Townsend, J. T., & Ashby, F. G. (1983). Stochastic Modeling of Elementary Psychological Processes. Cambridge: Cambridge University Press.Google Scholar
Townsend, J. T., & Eidels, A. (2011). Workload capacity spaces: a unified methodology for response time measures of efficiency as workload is varied. Psychonomic Bulletin & Review, 18 (4), 659681.Google Scholar
Townsend, J. T., & Nozawa, G. (1995). Spatio-temporal properties of elementary perception: an investigation of parallel, serial, and coactive theories. Journal of Mathematical Psychology, 39 (4), 321359.Google Scholar
Tukey, J. W. (1977). Exploratory Data Analysis. Reading: Addison-Wesley Publishing.Google Scholar
U.S. Department of Defense. (2011). VV&A Recommended Practices Guide. Washington, DC: Defense Modeling and Simulation Coordination Office. Retrieved from: https://vva.msco.mil [last accessed August 2, 2022].Google Scholar
van Zandt, T. (2000). How to fit a response time distribution. Psychonomic Bulletin & Review, 7 (3), 424465.Google Scholar
Vandekerckhove, J., Matzke, D., & Wagenmakers, E.-J. (2015). Model comparison and the principle of parsimony. In Busemeyer, J. R., Wang, Z., Townsend, J. T., & Eidels, A. (Eds.), The Oxford Handbook of Computational and Mathematical Psychology (pp. 300319). Oxford: Oxford University Press.Google Scholar
Veksler, V. D., Myers, C. W., & Gluck, K. A. (2015). Model flexibility analysis. Psychological Review, 122 (4), 755769.Google Scholar
Vitányi, P. M., & Li, M. (2000). Minimum description length induction, Bayesianism, and Kolmogorov complexity. IEEE Transactions on Information Theory, 46 (2), 446464.Google Scholar
Wagenmakers, E. J., Ratcliff, R., Gomez, P., & Iverson, G. J. (2004). Assessing model mimicry using the parametric bootstrap. Journal of Mathematical Psychology, 48 (1), 2850.Google Scholar
Walsh, M. M., Gunzelmann, G., & Van Dongen, H. P. A. (2017). Computational cognitive models of the temporal dynamics of fatigue from sleep loss. Psychonomic Bulletin & Review, 24, 17851807.Google Scholar
Weaver, R. (2008). Parameters, predictions, and evidence in computational modeling: a statistical view informed by ACT–R. Cognitive Science 32 (8), 13491375.Google Scholar
Yang, J., Pitt, M. A., Ahn, W. Y., & Myung, J. I. (2021). ADOpy: a python package for adaptive design optimization. Behavior Research Methods, 53 (2), 874897.CrossRefGoogle ScholarPubMed

Save book to Kindle

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×