Connectionist Models of Cognition

doi:10.1017/9781108755610.005

2 - Connectionist Models of Cognition

from Part II - Cognitive Modeling Paradigms

Published online by Cambridge University Press: 21 April 2023

Michael S. C. Thomas and

James L. McClelland

Edited by

Ron Sun

Show author details

Ron Sun: Affiliation:
Rensselaer Polytechnic Institute, New York

Book contents

Get access

Summary

In this chapter, we review computer models of cognition that have focused on the use of neural networks. These architectures were inspired by research into how computation works in the brain. The approach is called connectionism because it proposes that processing is characterized by patterns of activation across simple processing units connected together into complex networks, with knowledge stored in the strength of the connections between units. We place connectionism in its historical context, describing the “three ages” of artificial neural network research: from the genesis of the first formal theories of computation in the 1930s and 1940s, to the parallel distributed processing (PDP) models of cognition of the 1980s and 1990s, and the advances in “deep” neural networks emerging in the mid-2000s. Transition between the ages has been triggered by new insights into how to create and train more powerful artificial neural networks. We discuss important foundational cognitive models that illustrate some of the key properties of connectionist systems, and indicate how the novel theoretical contributions of these models arose from their key computational properties. We consider how connectionist modeling has influenced wider theories of cognition, and how in the future, connectionist modeling of cognition may progress by integrating further constraints from neuroscience and neuroanatomy.

Keywords

connectionism parallel distribution processing artificial neural networks deep neural networks learning development

Type: Chapter
Information: The Cambridge Handbook of Computational Cognitive Sciences , pp. 29 - 79

DOI: https://doi.org/10.1017/9781108755610.005 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2023

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Abel, S., Huber, W., & Dell, G. S. (2009). Connectionist diagnosis of lexical disorders in aphasia. Aphasiology, 23(11), 1353–1378.CrossRef Google Scholar

Abel, S., Willmes, K., & Huber, W. (2007). Model-oriented naming therapy: testing predictions of a connectionist model. Aphasiology, 21(5), 411–447.CrossRef Google Scholar

Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). A learning algorithm for Boltzmann machines. Cognitive Science, 9, 147–169.Google Scholar

Alireza, H., Fedor, A., & Thomas, M. S. C. (2017). Simulating behavioural interventions for developmental deficits: when improving strengths produces better outcomes than remediating weaknesses. In Gunzelmann, G., Howes, A., Tenbrink, T., & Davelaar, E., (Eds.), Proceedings of the 39th Annual Meeting of the Cognitive Science Society, London, UK.Google Scholar

Anderson, J., & Rosenfeld, E. (1988). Neurocomputing: Foundations of Research. Cambridge, MA: MIT Press.Google Scholar

Anderson, J. A. (1977). Neural models with cognitive implications. In LaBerge, D. & Samuels, S. J., (Eds.), Basic Processes in Reading Perception and Comprehension, (pp. 27–90). Hillsdale, NJ: Erlbaum.Google Scholar

Aru, J., & Vincente, R. (2018). What deep learning can tell us about higher cognitive functions like mindreading? arXiv:1803.10470v2Google Scholar

Bechtel, W., & Abrahamsen, A. (1991). Connectionism and the Mind. Oxford: Blackwell.Google Scholar

Berko, J. (1958). The child’s learning of English morphology. Word, 14, 150–177.Google Scholar

Betti, A., & Gori, M. (2020). Backprop diffusion is biologically plausible. arXiv:1912.04635v2Google Scholar

Blakeman, S., & Mareschal, D. (2020). A complementary learning systems approach to temporal difference learning. Neural Networks, 22, 218–230. https://doi.org/10.1016/j.neunet.2019.10.011 CrossRef Google Scholar

Botvinick, M. & Plaut, D. C. (2004). Doing without schema hierarchies: a recurrent connectionist approach to normal and impaired routine sequential action. Psychological Review, 111, 395–429.CrossRef Google Scholar PubMed

Botvinick, M. M., & Cohen, J. D. (2014). The computational and neural basis of cognitive control: charted territory and new frontiers. Cognitive Science, 38, 1249–1285. https://doi.org/10.1111/cogs.12126 Google Scholar

Brown, T. B., Mann, B., Ryder, N., et al. (2020). Language models are few-shot learners. arXiv:2005.14165.Google Scholar

Burton, A. M., Bruce, V., & Johnston, R. A. (1990). Understanding face recognition with an interactive activation model. British Journal of Psychology, 81, 361–380.Google Scholar

Bybee, J., & McClelland, J. L. (2005). Alternatives to the combinatorial paradigm of linguistic theory based on domain general principles of human cognition. The Linguistic Review, 22(2–4), 381–410.CrossRef Google Scholar

Chang, F., Dell, G. S., & Bock, K. (2006). Becoming syntactic. Psychological Review, 113(2), 234–272. https://doi.org/10.1037/0033-295X.113.2.234 Google Scholar

Chen, P. L., Lambon Ralph, M., & Rogers, T. T. (2017). A unified model of human semantic knowledge and its disorders. Nature Human Behaviour, 1, 0039. https://doi.org/10.1038/s41562-016-0039 Google Scholar

Christiansen, M. H. & Chater, N. (2001). Connectionist Psycholinguistics. Westport, CT: Ablex.Google Scholar PubMed

Cleeremans, A., & Dienes, Z. (2008). Computational models of implicit learning. In R. Sun (Ed.), The Cambridge Handbook of Computational Psychology (pp. 396–421). Cambridge: Cambridge University Press. https://doi.org/10.1017/cbo9780511816772.018 Google Scholar

Cobb, M. (2020). The Idea of the Brain. London: Profile Books.Google Scholar

Cohen, G., Johnstone, R. A., & Plunkett, K. (2000). Exploring Cognition: Damaged Brains and Neural Networks. Hove: Psychology Press.Google Scholar

Cohen, J. D., Dunbar, K., & McClelland, J. L. (1990). On the control of automatic processes: a parallel distributed processing account of the Stroop effect. Psychological Review, 97, 332–361.Google Scholar

Crick, F. (1989). The recent excitement about neural networks. Nature, 337, 129–132. https://doi.org/10.1038/337129a0 Google Scholar

Davelaar, E. J., & Usher, M. (2002). An activation-based theory of immediate item memory. In Bullinaria, J. A. & Lowe, W. (Eds.), Proceedings of the Seventh Neural Computation and Psychology Workshop: Connectionist Models of Cognition and Perception. Singapore: World Scientific.Google Scholar

Davies, M. (2005). Cognitive science. In Jackson, F. & Smith, M. (Eds.), The Oxford Handbook of Contemporary Philosophy. Oxford: Oxford University Press.Google Scholar

Devlin, J., Gonnerman, L., Andersen, E., & Seidenberg, M. S. (1997). Category specific semantic deficits in focal and widespread brain damage: a computational account. Journal of Cognitive Neuroscience, 10, 77–94.Google Scholar

Dündar-Coecke, S., & Thomas, M. S. C. (2019). Modeling socioeconomic effects on the development of brain and behavior. In Goel, A. K., Seifert, C. M., & Freksa, C. (Eds.), Proceedings of the 41^st Annual Conference of the Cognitive Science Society (pp. 1676–1682). Montreal: Cognitive Science Society.Google Scholar

Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14, 179–211.Google Scholar

Elman, J. L. (1991). Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning, 7, 195–224.Google Scholar

Elman, J. L. (1993). Learning and development in neural networks: the importance of starting small. Cognition, 48, 71–99.Google Scholar

Elman, J. L. (2005). Connectionist models of cognitive development: where next? Trends in Cognitive Sciences, 9, 111–117.Google Scholar

Elman, J. L. & McRae, K. (2019). A model of event knowledge. Psychological Review, 126 (2), 252–291. https://doi.org/10.1037/rev0000133 CrossRef Google Scholar

Elman, J. L., Bates, E. A., Johnson, M. H., Karmiloff-Smith, A., Parisi, D., & Plunkett, K. (1996). Rethinking Innateness: A Connectionist Perspective on Development. Cambridge, MA: MIT Press.Google Scholar

Ervin, S. M. (1964). Imitation and structural change in children’s language. In Lenneberg, E. H. (Ed.), New Directions in the Study of Language. Cambridge, MA: MIT Press.Google Scholar

Fahlman, S., & Lebiere, C. (1990). The cascade correlation learning architecture. In Touretzky, D. (Ed.), Advances in Neural Information Processing 2 (pp. 524–532). Los Altos, CA: Morgan Kauffman.Google Scholar

Feldman, J. A. (1981). A connectionist model of visual memory. In Hinton, G. E. & Anderson, J. A. (Eds.), Parallel Models of Associative Memory (pp. 49–81). Hillsdale, NJ: Erlbaum.Google Scholar

Fitz, H., & Chang, F. (2017). Meaningful questions: the acquisition of auxiliary inversion in a connectionist model of sentence production. Cognition, 166, 225–250. https://doi.org/10.1016/j.cognition.2017.05.008 Google Scholar

Fodor, J. A., & Pylyshyn, Z. W. (1988). Connectionism and cognitive architecture: a critical analysis. Cognition, 78, 3–71.CrossRef Google Scholar

French, R. M., Ans, B., & Rousset, S. (2001). Pseudopatterns and dual-network memory models: advantages and shortcomings. In French, R. & Sougné, J. (Eds.), Connectionist Models of Learning, Development and Evolution (pp. 13–22). London: Springer.Google Scholar

Freud, S. (1895). Project for a scientific psychology. In Strachey, J. (Ed.), The Standard Edition of the Complete Psychological Works of Sigmund Freud. London: The Hogarth Press and the Institute of Psycho-Analysis.Google Scholar

Friston, K. (2009). The free-energy principle: a rough guide to the brain? Trends in Cognitive Sciences, 13(7), 293–301. https://doi.org/10.1016/j.tics.2009.04.005 Google Scholar

Friston, K., & Kiebel, S. (2009). Predictive coding under the free-energy principle. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 364(1521), 1211–1221. https://doi.org/10.1098/rstb.2008.0300 CrossRef Google Scholar PubMed

Goebel, R., & Indefrey, P. (2000). A recurrent network with short-term memory capacity learning the German –s plural. In Broeder, P. & Murre, J. (Eds.), Models of Language Acquisition: Inductive and Deductive Approaches (pp. 177–200). Oxford: Oxford University Press.Google Scholar

Gordon, P. (2004). Numerical cognition without words: evidence from Amazonia. Science, 306(5695), 496–499.Google Scholar

Grainger, J., Midgley, K., & Holcomb, P. J. (2010). Re-thinking the bilingual interactive-activation model from a developmental perspective (BIA-d). In Kail, M. & Hickmann, M. (Eds.), Language Acquisition Across Linguistic and Cognitive Systems (pp. 267–283). Amsterdam: John Benjamins Publishing Company.Google Scholar

Green, D. C. (1998). Are connectionist models theories of cognition? Psycoloquy, 9(4).Google Scholar

Grossberg, S. (1976a). Adaptive pattern classification and universal recoding I: parallel development and coding of neural feature detectors. Biological Cybernetics, 23, 121–134..Google Scholar

Grossberg, S. (1976b). Adaptive pattern classification and universal recoding II: feedback, expectation, olfaction, and illusions. Biological Cybernetics, 23, 187–202.CrossRef Google Scholar

Haarmann, H., & Usher, M. (2001). Maintenance of semantic information in capacity limited item short-term memory. Psychonomic Bulletin & Review, 8, 568–578.Google Scholar

Hackman, D. A., Farah, M. J., & Meaney, M. J. (2010). Socioeconomic status and the brain. Nature Reviews Neuroscience, 11, 651–659.Google Scholar

Hahnloser, R., Sarpeshkar, R., Mahowald, M., et al. (2000). Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature, 405, 947–951. https://doi.org/10.1038/35016072 Google Scholar

Harm, M. W. & Seidenberg, M. S. (1999). Phonology, reading acquisition, and dyslexia: insights from connectionist models. Psychological Review, 106 (3), 491–528.Google Scholar

Hebb, D. O. (1949). The Organization of Behavior: A Neuropsychological Approach. New York, NY: John Wiley & Sons.Google Scholar

Hinton, G. E. (1989). Deterministic Boltzmann learning performs steepest descent in weight-space. Neural Computation, 1, 143–150.Google Scholar

Hinton, G. E., & Anderson, J. A. (1981). Parallel Models of Associative Memory. Hillsdale, NJ: Erlbaum.Google Scholar

Hinton, G. E., & McClelland, J. L. (1988). Learning representations by recirculation. In Anderson, D. Z., (Ed.), Neural Information Processing Systems (pp. 358–366). New York, NY: American Institute of Physics.Google Scholar

Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313 (5786), 504–507.Google Scholar

Hinton, G. E., & Sejnowski, T. (1986). Learning and relearning in Boltzmann machines. In Rumelhart, D. & McClelland, J. (Eds.), Parallel Distributed Processing (vol. 1, pp. 282–317). Cambridge, MA: MIT Press.Google Scholar

Hinton, G. E., & Sejnowski, T. J. (1983). Optimal perceptual inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC.Google Scholar

Hochreiter, S. (1991). Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, Institut f. Informatik, Technische Univ. Munich.Google Scholar

Hochreiter, S., Bengio, Y., Frasconi, P., & Schmidhuber, J. (2001). Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In Kremer, S. C. & Kolen, J. F. (Eds.), A Field Guide to Dynamical Recurrent Neural Networks. Piscataway, NJ: IEEE Press.Google Scholar

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735 Google Scholar

Hoeffner, J. H., & McClelland, J. L. (1993). Can a perceptual processing deficit explain the impairment of inflectional morphology in developmental dysphasia? A computational investigation. In Clark, E. V. (Ed.), Proceedings of the 25th Child Language Research Forum (pp. 38–49). Stanford, CA: Center for the Study of Language and Information.Google Scholar

Hoffman, P., McClelland, J., & Lambon Ralph, M. (2018). Concepts, control and context: a connectionist account of normal and disordered semantic cognition. Psychological Review, 125(3), 293–328. https://doi.org/10.1037/rev0000094 Google Scholar

Hofstadter, D. (2018). The shallowness of Google Translate. The Atlantic. Available from: www.theatlantic.com/technology/archive/2018/01/the-shallowness-of-google-translate/551570/ [last accessed August 9, 2022].Google Scholar

Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Science USA, 79, 2554–2558.Google Scholar

Houghton, G. (2005). Connectionist Models in Cognitive Psychology. Hove: Psychology Press.Google Scholar

James, W. (1890). Principles of Psychology. New York, NY: Holt.Google Scholar

Joanisse, M. F. & McClelland, J. L. (2015). Connectionist perspectives on language learning, representation, and processing. WIREs Cognitive Science (online). https://doi.org/10.1002/wcs.1340 Google Scholar

Joanisse, M. F. & Seidenberg, M. S. (1999). Impairments in verb morphology following brain injury: a connectionist model. Proceedings of the National Academy of Science, 96, 7592–7597.Google Scholar

Joanisse, M. F. & Seidenberg, M. S. (2003). Phonology and syntax in specific language impairment: evidence from a connectionist model. Brain and Language, 86, 40–56.Google Scholar

Jordan, M. I. (1986). Attractor dynamics and parallelism in a connectionist sequential machine. In Proceedings of the Eighth Annual Conference of Cognitive Science Society (pp. 531–546). Hillsdale, NJ: Erlbaum.Google Scholar

Karaminis, T. N., & Thomas, M. S. C. (2010). A cross-linguistic model of the acquisition of inflectional morphology in English and Modern Greek. In Ohlsson, S. & Catrambone, R. (Eds.), Proceedings of the 32nd Annual Conference of the Cognitive Science Society, August 11–14, 2010. Portland, Oregon, USA.Google Scholar

Karaminis, T. N., & Thomas, M. S. C. (2014). The multiple inflection generator: a generalized connectionist model for cross-linguistic morphological development. DNL Tech report 2014 (online). http://193.61.4.246/dnl/wp-content/uploads/2020/04/KT_TheMultipleInflectionGenerator2014.pdf [last accessed August 9, 2022].Google Scholar

Karmiloff-Smith, A. (1998). Development itself is the key to understanding developmental disorders. Trends in Cognitive Sciences, 2, 389–398.Google Scholar

Karmiloff-Smith, A. (2009). Nativism versus neuroconstructivism: rethinking the study of developmental disorders. Developmental Psychology, 45(1), 56–63.Google Scholar

Kirov, C. & Cotterell, R. (2018). Recurrent neural networks in linguistic theory: revisiting Pinker and Prince (1988) and the past tense debate. Transactions of the Association for Computational Linguistics, 6, 651–665. https://doi.org/10.1162/tacl_a_00247 Google Scholar

Knopik, V. S., Neiderhiser, J. M., DeFries, J. C., & Plomin, R. (2016). Behavioral genetics (7th ed). New York, NY: Worth Publishers.Google Scholar PubMed

Kohonen, T. (1984). Self-Organization and Associative Memory. Berlin: Springer-Verlag.Google Scholar

Kollias, P. & McClelland, J. L. (2013). Context, cortex, and associations: a connectionist developmental approach to verbal analogies. Frontiers in Psychology, 4, 857. https://doi.org/10.3389/fpsyg.2013.00857 CrossRef Google Scholar PubMed

Kriegeskorte, N. (2015). Deep neural networks: a new framework for modeling biological vision and brain information processing. Annual Review of Vision Science, 1, 417–446.Google Scholar

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Proceedings of the 25th International Conference on Neural Information Processing Systems, 1, 1097–1105.Google Scholar

Kuczaj, S. A. (1977). The acquisition of regular and irregular past tense forms. Journal of Verbal Learning and Verbal Behavior, 16, 589–600.Google Scholar

Lake, B. M., Ullman, T. D., Tenenbaum, J. B., & Gershman, S. J. (2017). Building machines that learn and think like people. Behavioral and Brain Sciences, 40, e253.Google Scholar

Lashley, K. S. (1929). Brain Mechanisms and Intelligence: A Quantitative Study of Injuries to the Brain. New York, NY: Dover Publications, Inc.Google Scholar

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521 (7553), 436.CrossRef Google Scholar PubMed

Lillicrap, T., Cownden, D., Tweed, D., & Akerman, C. J. (2016). Random synaptic feedback weights support error backpropagation for deep learning. Nature Communications, 7, 13276. https://doi.org/10.1038/ncomms13276 Google Scholar

Lillicrap, T. P., Santoro, A., Marris, L., Akerman, C. J., & Hinton, G. E. (2020). Backpropagation and the brain. Nature Reviews Neuroscience, 21, 335–346. https://doi.org/10.1038/s41583–020-0277-3 Google Scholar

MacDonald, M. C., & Christiansen, M. H. (2002). Reassessing working memory: a comment on Just & Carpenter (1992) and Waters & Caplan (1996). Psychological Review, 109, 35–54.Google Scholar

MacKay, D. J. (1992). A practical Bayesian framework for backpropagation networks. Neural Computation, 4, 448–472.Google Scholar

Magnuson, J. S., Li, M., Luthra, S., You, H., & Steiner, R. (2019). Does predictive processing imply predictive coding in models of spoken word recognition? In Proceedings of the 41st Annual Meeting of the Cognitive Science Society (pp. 735–740). Cognitive Science Society.Google Scholar

Manning, C. D., Clark, K., Hewitt, J., Khandelwal, U., & Levy, O. (2020) Emergent linguistic structure in artificial neural networks trained by self-supervision. Proceedings of the National Academy of Sciences, 117(48), 30046–30054.Google Scholar

Marcus, G. F. (2001). The Algebraic Mind: Integrating Connectionism and Cognitive Science. Cambridge, MA: MIT Press.CrossRef Google Scholar

Marcus, G., Pinker, S., Ullman, M., Hollander, J., Rosen, T., & Xu, F. (1992). Overregularisation in language acquisition. Monographs of the Society for Research in Child Development, 57 (228), 1–178.Google Scholar

Mareschal, D., & Thomas, M. S. C. (2007). Computational modeling in developmental psychology. IEEE Transactions on Evolutionary Computation (Special Issue on Autonomous Mental Development), 11, 137–150.Google Scholar

Mareschal, D., Johnson, M., Sirios, S., Spratling, M., Thomas, M. S. C., & Westermann, G. (2007). Neuroconstructivism: How the Brain Constructs Cognition. Oxford: Oxford University Press.Google Scholar

Marr, D. (1982). Vision. San Francisco, CA: W. H. Freeman.Google Scholar

Marr, D., & Poggio, T. (1976). Cooperative computation of stereo disparity. Science, 194, 283–287.CrossRef Google Scholar PubMed

Mayor, J., Gomez, P., Chang, F., & Lupyan, G. (2014). Connectionism coming of age: legacy and future challenges. Frontiers In Psychology, 5, 187. https://doi.org/10.3389/fpsyg.2014.00187 Google Scholar

McClelland, J. L. (1981). Retrieving general and specific information from stored knowledge of specifics. In Proceedings of the Third Annual Meeting of the Cognitive Science Society (pp. 170–172). Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar

McClelland, J. L. (1989). Parallel distributed processing: implications for cognition and development. In Morris, M. G. M. (Ed.), Parallel Distributed Processing, Implications for Psychology and Neurobiology (pp. 8–45). Oxford: Clarendon Press.Google Scholar

McClelland, J. L. (2013). Integrating probabilistic models of perception and interactive neural networks: a historical and tutorial review. Frontiers in Psychology, 4, 503. www.frontiersin.org/articles/10.3389/fpsyg.2013.00503/full CrossRef Google Scholar PubMed

McClelland, J. L., & Elman, J. L. (1986). The TRACE model of speech perception. Cognitive Psychology, 18, 1–86.Google Scholar

McClelland, J. L., Hill, F., Rudolph, M., Baldridge, J., & Schuetze, H. (2020). Placing language in an integrated understanding system: next steps toward human-level performance in neural language models. Proceedings of the National Academy of Sciences, 117(42), 25966–25974. https://doi.org/10.1073/pnas.1910416117 Google Scholar

McClelland, J. L., McNaughton, B. L., & O’Reilly, R. C. (1995). Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychological Review, 102, 419–457.Google Scholar

McClelland, J. L., Plaut, D. C., Gotts, S. J., & Maia, T. V. (2003). Developing a domain-general framework for cognition: what is the best approach? Commentary on a target article by Anderson and Lebiere. Behavioral and Brain Sciences, 22, 611–614.Google Scholar

McClelland, J. L., & Rumelhart, D. E. (1981). An interactive activation model of context effects in letter perception. Part 1: An account of basic findings. Psychological Review, 88(5), 375–405.Google Scholar

McClelland, J. L., Rumelhart, D. E. & the PDP Research Group (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 2: Psychological and Biological Models. Cambridge, MA: MIT Press.Google Scholar

McCulloch, W. S., & Pitts, W. (1943). A logical calculus of ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5, 115–133.Google Scholar

McLeod, P., Plunkett, K., & Rolls, E. T. (1998). Introduction to Connectionist Modelling of Cognitive Processes. Oxford: Oxford University Press.Google Scholar

Meynert, T. (1884). Psychiatry: A Clinical Treatise on Diseases of the Forebrain. Part I. The Anatomy, Physiology and Chemistry of the Brain. Trans. B. Sachs. New York, NY: G. P. Putnam’s Sons.Google Scholar

Minsky, M., & Papert, S. (1969). Perceptrons: An Introduction to Computational Geometry. Cambridge, MA: MIT Press.Google Scholar

Morton, J. (1969). Interaction of information in word recognition. Psychological Review, 76, 165–178.Google Scholar

Morton, J. B., & Munakata, Y. (2002). Active versus latent representations: a neural network model of perseveration, dissociation, and decalage in childhood. Developmental Psychobiology, 40, 255–265.Google Scholar

Moutoussis, M., Shahar, N., Hauser, T. U., & Dolan, R. J. (2017). Computation in psychotherapy, or how computational psychiatry can aid learning-based psychological therapies. Computational Psychiatry, 2, 50–73. https://doi.org/10.1162/%20cpsy_a_00014 Google Scholar

Movellan, J. R., & McClelland, J. L. (1993). Learning continuous probability distributions with symmetric diffusion networks. Cognitive Science, 17, 463–496.Google Scholar

Munakata, Y. (1998). Infant perseveration and implications for object permanence theories: a PDP model of the AB task. Developmental Science, 1, 161–184.CrossRef Google Scholar

Munakata, Y. & McClelland, J. L. (2003). Connectionist models of development. Developmental Science, 6, 413–429.Google Scholar

Newell, A. (1980). Physical symbol systems. Cognitive Science, 4(2), 135–183.Google Scholar

Novikoff, A. (1962). Proceedings of the Symposium on the Mathematical Theory of Automata, 12, 615–622. New York, NY: Polytechnic Institute of Brooklyn.Google Scholar

O’Reilly, R. C. (1996). Biologically plausible error-driven learning using local activation differences: the generalized recirculation algorithm. Neural Computation, 8, 895–938.Google Scholar

O’Reilly, R. C. (1998). Six principles for biologically based computational models of cortical cognition. Trends in Cognitive Sciences, 2, 455–462.Google Scholar

O’Reilly, R. C., Bhattacharyya, R., Howard, M. D., & Ketza, N. (2014). Complementary learning systems. Cognitive Science, 38, 1229–1248. https://doi.org/10.1111/j.1551-6709.2011.01214.x Google Scholar

O’Reilly, R. C., Braver, T. S., & Cohen, J. D. (1999). A biologically based computational model of working memory. In Miyake, A. & Shah, P. (Eds.), Models of Working Memory: Mechanisms of Active Maintenance and Executive Control. New York, NY: Cambridge University Press.Google Scholar

O’Reilly, R. C., & Munakata, Y. (2000). Computational Explorations in Cognitive Neuroscience: Understanding the Mind by Simulating the Brain. Cambridge, MA: MIT Press.Google Scholar

Pater, J. (2019). Generative linguistics and neural networks at 60: foundation, friction, and fusion. Language, 95(1). Epub February 20, 2019. https://doi.org/10.1353/lan.2019.0005 Google Scholar

Piazza, M., Pica, P., Izard, V., Spelke, E. S., & Dehaene, S. (2013). Education enhances the acuity of the nonverbal approximate number system. Psychological Science, 24(6), 1037–1043. https://doi.%20org/10.1177/09567%2097612%20464057.CrossRef Google Scholar PubMed

Pinker, S. (1984). Language Learnability and Language Development. Cambridge, MA: Harvard University Press.Google Scholar

Pinker, S. (1999). Words and Rules. London: Weidenfeld & Nicolson.Google Scholar

Pinker, S., & Prince, A. (1988). On language and connectionism: analysis of a parallel distributed processing model of language acquisition. Cognition, 28, 73–193.Google Scholar

Plaut, D. C., & Kello, C. T. (1999). The emergence of phonology from the interplay of speech comprehension and production: a distributed connectionist approach. In MacWhinney, B. (Ed.), The Emergence of Language (pp. 381–415). Mahwah, NJ: Erlbaum.Google Scholar

Plaut, D. C. & McClelland, J. L. (1993). Generalization with componential attractors: word and nonword reading in an attractor network. In Proceedings of the Fifteenth Annual Conference of the Cognitive Science Society (pp. 824–829). Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar

Plaut, D. C., McClelland, J. L., Seidenberg, M. S., & Patterson, K. E. (1996). Understanding normal and impaired word reading: computational principles in quasi-regular domains. Psychological Review, 103, 56–115.Google Scholar

Plunkett, K., & Marchman, V. (1991). U-shaped learning and frequency effects in a multi-layered perceptron: implications for child language acquisition. Cognition, 38, 1–60.Google Scholar

Plunkett, K., & Marchman, V. (1993). From rote learning to system building: acquiring verb morphology in children and connectionist nets. Cognition, 48, 21–69.Google Scholar

Plunkett, K., & Marchman, V. (1996). Learning from a connectionist model of the English past tense. Cognition, 61, 299–308.Google Scholar

Plunkett, K., & Nakisa, R. (1997). A connectionist model of the Arabic plural system. Language and Cognitive Processes, 12, 807–836.Google Scholar

Rao, R., & Ballard, D. (1999). Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience 2, 79–87. https://doi.org/10.1038/4580 Google Scholar

Rashevsky, N. (1935). Outline of a physico-mathematical theory of the brain. Journal of General Psychology, 13, 82–112.Google Scholar

Reicher, G. M. (1969). Perceptual recognition as a function of meaningfulness of stimulus material. Journal of Experimental Psychology, 81, 274–280.Google Scholar

Ritter, S., Barrett, D. G. T., Santoro, A., & Botvinick, M. M. (2017). Cognitive psychology for deep neural networks: a shape bias case study. arXiv:1706.08606v2Google Scholar

Rohde, D. L. T. & Plaut, D. C. (1999). Language acquisition in the absence of explicit negative evidence: how important is starting small? Cognition, 72, 67–109.Google Scholar

Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, 65, 386–408.Google Scholar

Rosenblatt, F. (1962). Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Washington, DC: Spartan Books.Google Scholar

Rumelhart, D. E., & McClelland, J. L. (1982). An interactive activation model of context effects in letter perception. Part 2: The contextual enhancement effect and some tests and extensions of the model. Psychological Review, 89, 60–94.CrossRef Google Scholar PubMed

Rumelhart, D. E., & McClelland, J. L. (1985). Levels indeed! Journal of Experimental Psychology General, 114(2), 193–197.Google Scholar

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In, D. E. Rumelhart, J. L. McClelland, , & the PDP Research Group, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations (pp. 318–362). Cambridge, MA: MIT Press.Google Scholar

Rumelhart, D. E., Hinton, G. E., & McClelland, J. L. (1986). A general framework for parallel distributed processing. In Rumelhart, D. E., McClelland, J. L., & the PDP Research Group, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations (pp. 45–76). Cambridge, MA: MIT Press.Google Scholar

Rumelhart, D. E., & McClelland, J. L. (1986). On learning the past tense of English verbs. In McClelland, J. L., Rumelhart, D. E., & the PDP Research Group (Eds.). Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 2: Psychological and Biological Models (pp. 216–271). Cambridge, MA: MIT Press.Google Scholar

Rumelhart, D. E., McClelland, J. L. & the PDP Research Group (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1: Foundations. Cambridge, MA: MIT Press.Google Scholar

Rumelhart, D. E., Smolensky, P., McClelland, J. L., & Hinton, G. E. (1986). Schemata and sequential thought processes in PDP models. In, J. L. McClelland, D. E. Rumelhart, , & the PDP Research Group, Explorations in the Microstructure of Cognition Volume 2: Psychological and Biological Models (pp. 7–57). Cambridge, MA: MIT Press.Google Scholar

Sabatiel, S., McClelland, J. L., & Solstad, T. (2020). A computational model of learning to count in a multimodal, interactive environment. Proceedings of the 42nd Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society.Google Scholar

Saffran, J. R., & Kirkham, N. Z. (2018). Infant statistical learning. Annual Review of Psychology, 69, 181–203. https://doi.org/10.1146/annurev-psych-122216-011805 Google Scholar

Saffran, J. R., Newport, E. L., & Aslin, R. N. (1996). Word segmentation: the role of distributional cues. Journal of Memory and Language, 35, 606–621.Google Scholar

Scellier, B., & Bengio, Y. (2019). Equivalence of equilibrium propagation and recurrent backpropagation. Neural Computation, 31(2), 312–329. https://doi.org/10.1162/neco_a_01160 Google Scholar

Schmidhuber, J. (2015). Deep learning in neural networks: an overview. Neural Networks, 61, 85–117. https://doi.org/10.1016/j.neunet.2014.09.003 Google Scholar

Seidenberg, M. S. (1993). Connectionist models and cognitive theory. Psychological Science, 4(4), 228–235.Google Scholar

Seidenberg, M. S. (2017). Language at the Speed of Sight. New York, NY: Basic Books.Google Scholar

Selfridge, O. G. (1959). Pandemonium: a paradigm for learning. In Symposium on the Mechanization of Thought Processes (pp. 511–529). London: HMSO.Google Scholar

Shallice, T. (1988). From Neuropsychology to Mental Structure. Cambridge: Cambridge University Press.Google Scholar

Shultz, T. R. (2003). Computational Developmental Psychology. Cambridge, MA: MIT Press.Google Scholar

Smolensky, P. (1988). On the proper treatment of connectionism. Behavioral and Brain Sciences, 11, 1–74.Google Scholar

Spencer, H. (1872). Principles of Psychology (3rd ed.). London: Longman, Brown, Green, & Longmans.Google Scholar

Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 1929–1958.Google Scholar

Stoianov, I., & Zorzi, M. (2012). Emergence of a ‘visual number sense’ in hierarchical generative models. Nature Neuroscience, 15(2), 194–196.Google Scholar

Storrs, K. R., & Kriegeskorte, N. (2019). Deep learning for cognitive neuroscience. arXiv:1903.01458v1Google Scholar

Sutton, R. S., & Barto, A. G. (1981). Toward a modern theory of adaptive networks: expectation and prediction. Psychological Review, 88(2), 135–170.Google Scholar

Testolin, A., Zou, W. Y., & McClelland, J. L. (2020). Numerosity discrimination in deep neural networks: initial competence, developmental refinement and experience statistics. Developmental Science, 2020, e12940.Google Scholar

Thomas, M. S. C. (2016). Do more intelligent brains retain heightened plasticity for longer in development? A computational investigation. Developmental Cognitive Neuroscience, 19, 258–269. https://doi.org/10.1016/j.dcn.2016.04.002 Google Scholar

Thomas, M. S. C. (2018). A neurocomputational model of developmental trajectories of gifted children under a polygenic model: when are gifted children held back by poor environments? Intelligence, 69, 200–212.Google Scholar

Thomas, M. S. C., & Brady, D. (2021). Quo vadis modularity in the 2020s? In Thomas, M. S. C., Mareschal, D., & Knowland, V. C. P. (Eds). Taking Development Seriously: A Festschrift for Annette Karmiloff-Smith. London: Routledge Psychology.Google Scholar

Thomas, M. S. C., Davis, R., Karmiloff-Smith, A., Knowland, V. C. P., & Charman, T. (2016). The over-pruning hypothesis of autism. Developmental Science, 9(2), 284–305. https://doi.org/10.1111/desc.12303 Google Scholar

Thomas, M. S. C., Fedor, A., Davis, R., Yang, J., Alireza, H., Charman, T., Masterson, J., & Best, W. (2019). Computational modelling of interventions for developmental disorders. Psychological Review, 26(5), 693–726. https://doi.org/10.1037/rev0000151 Google Scholar

Thomas, M. S. C., Forrester, N. A., & Richardson, F. M. (2006). What is modularity good for? In Proceedings of The 28th Annual Conference of the Cognitive Science Society (pp. 2240–2245), July 26–29, Vancouver, BC, Canada.Google Scholar

Thomas, M. S. C., Forrester, N. A., & Ronald, A. (2013). Modeling socioeconomic status effects on language development. Developmental Psychology, 49(12), 2325–2343. https://doi.org/10.1037/a0032301 Google Scholar

Thomas, M. S. C., Forrester, N. A., & Ronald, A. (2016). Multi-scale modeling of gene-behavior associations in an artificial neural network model of cognitive development. Cognitive Science, 40(1), 51–99. https://doi.org/10.1111/cogs.12230 Google Scholar

Thomas, M. S. C., & Karmiloff-Smith, A. (2002a). Are developmental disorders like cases of adult brain damage? Implications from connectionist modelling. Behavioral and Brain Sciences, 25(6), 727–788.Google Scholar

Thomas, M. S. C., & Karmiloff-Smith, A. (2002b). Modelling typical and atypical cognitive development. In Goswami, U. (Ed.), Handbook of Childhood Development (pp. 575–599). Oxford: Blackwell.Google Scholar

Thomas, M. S. C., & Karmiloff-Smith, A. (2003a). Modeling language acquisition in atypical phenotypes. Psychological Review, 110(4), 647–682.Google Scholar

Thomas, M. S. C., & Karmiloff-Smith, A. (2003b). Connectionist models of development, developmental disorders and individual differences. In Sternberg, R. J., Lautrey, J., & Lubart, T. (Eds.), Models of Intelligence: International Perspectives, (pp. 133–150). Washington, DC: American Psychological Association.Google Scholar

Thomas, M. S. C., & Knowland, V. C. P. (2014). Modelling mechanisms of persisting and resolving delay in language development. Journal of Speech, Language, and Hearing Research, 57(2), 467–483. https://doi.org/10.1044/2013_JSLHR-L-12-0254 Google Scholar

Thomas, M. S. C., & Van Heuven, W. (2005). Computational models of bilingual comprehension. In Kroll, J. F. & De Groot, A. M. B. (Eds.), Handbook of Bilingualism: Psycholinguistic Approaches (pp. 202–225). Oxford: Oxford University Press.Google Scholar

Touretzky, D. S., & Hinton, G. E. (1988). A distributed connectionist production system. Cognitive Science, 12, 423–466.Google Scholar

Tovar, A., Westermann, G., & Torres, A. (2017). From altered LTP/LTD to atypical learning: a computational model of Down syndrome. Cognition, 171, 15–24. https://doi.org/10.1016/j.cognition.2017.10.021 Google Scholar

Ueno, T., Saito, S., Rogers, T. T., & Lambon Ralph, M. A. (2011). Lichtheim 2: synthesizing aphasia and the neural basis of language in a neurocomputational model of the dual dorsal-ventral language pathways. Neuron, 72(2), 385–396. https://doi.org/10.1016/j.neuron.2011.09.013 Google Scholar

Usher, M., & McClelland, J. L. (2001). On the time course of perceptual choice: the leaky competing accumulator model. Psychological Review, 108, 550–592.Google Scholar

van Gelder, T. (1991). Classical questions, radical answers: connectionism and the structure of mental representations. In Horgan, T. & Tienson, J. (Eds.), Connectionism and the Philosophy of Mind. Dordrecht: Kluwer Academic Publishers.Google Scholar

Verguts, T., & Fias, W. (2004). Representation of number in animals and humans: a neural model. Journal of Cognitive Neuroscience, 16(9), 1493–1504. https://doi.org/10.1162/0898929042568497 Google Scholar

Westermann, G., Mareschal, D., Johnson, M. H., Sirois, S., Spratling, M. W., & Thomas, M. S. C. (2007). Neuroconstructivism. Developmental Science, 10, 75–83.Google Scholar

Westermann, G., Thomas, M. S. C., & Karmiloff-Smith, A. (2010). Neuroconstructivism. In Goswami, U. (Ed.), Blackwell Handbook of Child Development (2nd ed.), (pp. 723–748). Oxford: Blackwell.Google Scholar

Williams, R. J., & Zipser, D. (1995). Gradient-based learning algorithms for recurrent networks and their computational complexity. In Chauvin, Y. & Rumelhart, D. E. (Eds.), Back-propagation: Theory, Architectures and Applications. Hillsdale, NJ: Erlbaum.Google Scholar

Woollams, A. M. (2014). Connectionist neuropsychology: uncovering ultimate causes of acquired dyslexia. Philosophical Transactions of the Royal Society B, 369(1634), https://doi.org/10.1098/rstb.2012.0398 Google Scholar

Wu, Y., Schuster, M., Chen, Z., et al. (2016). Google’s neural machine translation system: bridging the gap between human and machine translation. Available from: https://arxiv.org/abs/1609.08144 [last accessed August 9, 2022].Google Scholar

Xie, X., & Seung, H. S. (2003). Equivalence of backpropagation and contrastive Hebbian learning in a layered network. Neural Computation, 15, 441–454.Google Scholar

Xu, F., & Pinker, S. (1995). Weird past tense forms. Journal of Child Language, 22, 531–556.Google Scholar

Yamins, D. L., Hong, H., Cadieu, C. F., Solomon, E. A., Seibert, D., & DiCarlo, J. J. (2014). Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences, 111(23), 8619–8624.Google Scholar