Computational cognitive modeling for syntactic acquisition: Approaches that integrate information from multiple places

Lisa PEARL

doi:10.1017/S0305000923000247

Computational cognitive modeling for syntactic acquisition: Approaches that integrate information from multiple places

Published online by Cambridge University Press: 13 June 2023

Lisa PEARL

Show author details

Lisa PEARL*: Affiliation:
University of California, Irvine
*: Email: [email protected]

Article contents

Abstract
Introduction
Some modeling case studies in syntactic acquisition
Some experimental work to take inspiration from
Moving forward
Conclusion
Competing interest
Footnotes
References

Rights & Permissions

Abstract

Computational cognitive modeling is a tool we can use to evaluate theories of syntactic acquisition. Here, I review several models implementing theories that integrate information from both linguistic and non-linguistic sources to learn different types of syntactic knowledge. Some of these models additionally consider the impact of factors coming from children’s developing non-linguistic cognition. I discuss some existing child behavioral work that can inspire future model-building, and conclude by considering more specifically how to build better models of syntactic acquisition.

Keywords

Computational modeling Syntax Syntactic acquisition Non-syntactic information Nonlinguistic information

Type: Article
Information: Journal of Child Language , Volume 50 , Issue 6: Special Issue on Computational Modelling of Language Acquisition , November 2023 , pp. 1353 - 1373

DOI: https://doi.org/10.1017/S0305000923000247 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2023. Published by Cambridge University Press

Introduction

About computational cognitive modeling for syntactic acquisition

One tool we can use to understand how syntactic acquisition works is computational cognitive modeling. The computational part refers to implementing an idea (that is, a theory) very precisely, typically using mathematical techniques that are carried out on computers. The cognitive part refers to what the implemented ideas are about, which is some part of human cognition. The modeling part refers to the theory itself, which captures (i.e., models) some aspect of cognition (here: syntactic acquisition). With this tool of computational cognitive modeling, we can then make a theory about syntactic acquisition concrete enough to evaluate, because the computational cognitive model allows us to generate predictions about children’s syntactic behavior that can be evaluated. That is, when we have a computational cognitive model for syntactic acquisition, we have a theory about syntactic acquisition that is implemented precisely enough to evaluate against empirical data.

Importantly, the computational cognitive model serves as a “proof of concept” for a theory. When the model generates predictions that match human behavior (e.g., children’s syntactic behavior), this is proof there is at least one way the theory could explain human behavior – which is the way the theory was implemented in the computational cognitive model. An important limitation of computational cognitive modeling is that modeling success (or failure) can only be interpreted with respect to the specific theory implemented by the model. That is, if the model succeeds at matching human behavior, we can only interpret this success as success of that specific implementation of that acquisition theory – we have nothing to say about other implementations of this particular theory, or other theories not implemented in the model. The same is true for interpreting model failure: failure is only demonstrated for that specific theory implementation. If we want to evaluate some other theory implementation, we need to build another model and see how it does. See Pearl (Reference Pearl2014, Reference Pearl and Sprousein press) for more detailed discussion about how to interpret computational cognitive model success (and failure).

Implementing a theory in a computational cognitive model

When we have a theory of syntactic acquisition, how do we implement it in a computational cognitive model? Implementing the model involves several key aspects. First, the model needs to encode relevant prior knowledge and learning abilities the child is supposed to have at this stage of development. This knowledge and these abilities are often assumed implicitly by the acquisition theory. For instance, a syntactic acquisition theory might assume prior knowledge of individual words in the language and the ability to segment speech reliably from the input.

Second, the model needs to learn from realistic input. For instance, a model meant to capture syntactic acquisition behavior that occurs at age four should ideally learn from input that children encounter by age four.

Third, the model needs to output predictions that connect in some interpretable way to children’s behavior. For instance, a model might predict if a child at age four would treat two verbs as being syntactically the same (i.e., appearing in the same syntactic contexts and having the same interpretations of their arguments).

Fourth, the model needs to encode learning, which is how the modeled child uses the information from the input to update hypotheses about syntax. Learning is typically the main component specified by the acquisition theory. For instance, a model might attend to the distribution of certain features of the input viewed as relevant (e.g., animacy of verb arguments, syntactic contexts a verb appears in), and then use probabilistic inference to group verbs together that seem similar enough with respect to those relevant features.

So, to sum up, implementing an acquisition theory in a computational cognitive model involves encoding the acquisition theory assumptions (i.e., the prior knowledge assumed, the learning abilities assumed, and how learning proceeds), learning from realistic input estimates, and generating interpretable output that can be evaluated against empirical data from children. This is an approach that the models reviewed below have taken for investigating syntactic acquisition.

Road map

I will focus on computational cognitive models of syntactic acquisition that integrate information from multiple places, including both linguistic and non-linguistic sources of information. That is, the syntactic acquisition theories implemented by these models assume that syntactic learning proceeds by children attending to information from these different sources, rather than solely syntactic sources. Why discuss this kind of model? To me, these models seem more realistic because children are surrounded by many different types of information and have many different learning goals simultaneously. That is, children do not ever only learn about syntax; instead, they learn about syntax and about who is likely to give them a hug and about how to communicate their desire for more milk, among many other things. So, non-syntactic sources of information may be particularly salient in any given moment while children are learning about syntax; if these sources of information happen to be helpful for learning about syntax, then children may very well be able to harness those sources to do so.

Moreover, children are likely impacted by non-linguistic factors during acquisition. For instance, cognitive limitations on memory, attention, and executive control can affect how children perceive the information in their input, how they update their internal hypotheses, and how they generate their observable syntactic behavior. In addition, children likely rely on non-linguistic learning mechanisms to update their internal hypotheses, such as probabilistic inference. In fact, all the models of syntactic acquisition reviewed below rely on probabilistic inference, and so already incorporate this non-linguistic component into their theories of syntactic acquisition.Footnote ¹

Here, as mentioned, I focus on syntactic acquisition models that also integrate information from non-syntactic sources. I should note that these are selected case studies in syntactic acquisition modeling from my own work, rather than capturing the full range of computational cognitive models that implement this type of syntactic acquisition theory. I first review three case studies, whose acquisition theories incorporate conceptual information such as the animacy of an event participant, participant event roles more generally, and components of lexical meaning. Some of these theories additionally incorporate non-linguistic cognitive limitations affecting both input perception and hypothesis updating by implementing the impact of those limitations on input perception and hypothesis updating. I note that these theories are agnostic as to the specific source of the cognitive limitations (e.g., whether the source of the limitations is developing knowledge, developing learning abilities, or something else); instead, the practical impact of the cognitive limitations on the acquisition process is what the model captures. These case studies involve the acquisition of syntactic knowledge about linking theories, the passive, and pronoun interpretation.

I then briefly review some existing child behavioral work that we can take inspiration from when it comes to building better computational cognitive models of syntactic acquisition. I also discuss more specifically how we can think about building better models, and how we can incorporate the insights from both the behavioral work reviewed and current modeling work. I conclude with a few other ideas for building better models of syntactic acquisition in the future.

Some modeling case studies in syntactic acquisition

For each of the modeling case studies below, I first describe the syntactic knowledge children are trying to acquire. I then describe the relevant aspects of the acquisition theory implemented in the computational cognitive model, including the prior theories the implemented theory builds on, which information sources are used, the form the information sources take, and how those sources are used to update the modeled child’s hypotheses. I explicitly highlight which information sources are non-syntactic, as relevant. I also describe the input to the model, how the model’s output is evaluated against empirical data from children’s behavior, and what we learned by using modeling this way.

Linking theories

The syntactic knowledge

One type of syntactic knowledge is how to interpret a verb’s arguments in context. For instance, consider this sentence: The little girl blicked the kitten on the stairs. Even if we do not know what blick means, we still prefer to interpret this sentence as the little girl doing something (blicking) to the kitten, and that event happening on the stairs. The reason we as adults prefer this interpretation is because we have linking theories that link the thematic roles specified by a verb’s lexical semantics (e.g., agent, patient, location) to the syntactic argument positions specified by that verb’s syntactic frame (e.g., subject, direct object, object of a preposition). Moreover, our linking theories are so well-developed that they can impose these links even when we do not know a verb’s specific lexical semantics (like here with blick).

Verbs can be grouped together into classes where the verbs in a class behave the same way with respect to the links between syntactic positions and thematic roles. That is, solving the linking problem (i.e., acquiring linking theories for the verbs of the language) involves learning how to link syntactic positions and thematic roles for different verbs; verb classes are collections of verbs that behave the same way for linking. For example, verbs with “subject-raising” behavior like appear and seem allow their subject to not have a thematic role. So, in Lindy seemed/appeared to hug the kitten, Lindy is not a “seemer” or an “appearer”, but rather a kitten-hugger. As another example, verbs with “unaccusative” behavior like fall and break have a patient in the subject position. So, in The toy kitten fell/broke, falling or breaking is happening to the toy kitten. As a third example, verbs with passivizable behavior like hug and break allow their subject to be a patient in the passive construction, while verbs like appear, seem, and fall do not. That is, The toy kitten was hugged/broken by Lindy, with hugging or breaking happening to the toy kitten, is acceptable. In contrast, The toy kitten was seemed/appeared/fallen by Lindy, with seeming, appearing, or falling happening to the toy kitten, is not acceptable.

These examples demonstrate that a verb class can involve many linking behaviors. Here, one verb class involving fall might be characterized as +unaccusative and -passivizable; another verb class involving break might be characterized as +unaccusative and +passivizable; a third verb class involving seem and appear might be characterized as +subject-raising and -passivizable. To learn what verbs belong together in a class, children must implicitly develop the linking theory for that verb class. This is why acquiring verb classes can be used as a measure of linking theory development. In short, if a child (and therefore a modeled child) can cluster verbs together into classes that behave the same linking-wise, then the child (real or modeled) can be said to have developed the relevant linking theory knowledge that leads to those verb classes.

The acquisition theory implemented in the model

Pearl and Sprouse (Reference Pearl and Sprouse2019) proposed that children can cluster verbs into appropriate verb classes by paying attention to several pieces of information associated with verbs in their input: argument animacy, syntactic context, and link distribution. This verb information has been proposed by prior theories as (potentially) relevant (e.g., Becker, Reference Becker2009, Reference Becker2014, Reference Becker2015; Becker & Estigarribia, Reference Becker and Estigarribia2013; Fisher, Gertner, Scott & Yuan, Reference Fisher, Gertner, Scott and Yuan2010; Gillette, Gleitman, Gleitman & Lederer, Reference Gillette, Gleitman, Gleitman and Lederer1999; Gleitman, Reference Gleitman1990; Gutman, Dautriche, Crabbé & Christophe, Reference Gutman, Dautriche, Crabbé and Christophe2015; Harrigan, Hacquard & Lidz, Reference Harrigan, Hacquard and Lidz2016; Hartshorne, Pogue & Snedeker, Reference Hartshorne, Pogue and Snedeker2015b; Kirby, Reference Kirby2009a, Reference Kirby2009b; Landau & Gleitman, Reference Landau and Gleitman1985; Levin, Reference Levin1993; Scott & Fisher, Reference Scott and Fisher2009). To see a concrete example of each information type, consider two of the utterances involving break from our examples: the unaccusative The toy kitten broke and the passive The toy kitten was broken by Lindy. First, the animacy of the verb’s arguments matters. For instance, a child would notice that The toy kitten is inanimate. Second, the syntactic contexts that a verb appears in matter. So, a child would notice that break appeared in an unaccusative context of the form Noun-Phrase Verb and a passive context Noun-Phrase was Verb Preposition Noun-Phrase. Third, the distribution of links between thematic roles and syntactic positions matters. Here, a child would notice that break has the following links in the two utterances above: two instances of Patient in subject position (from The toy kitten in both utterances) and one instance of Agent in the prepositional phrase position (from Lindy in the passive utterance).

Pearl and Sprouse made the idealizing assumption that children would have enough prior knowledge and sufficient learning abilities to accurately extract this information from any particular verb use they encountered. This assumption can be relaxed in future work (i.e., we can assume that children do not accurately extract information due to immature knowledge, immature learning abilities, or cognitive limitations more generally). However, this assumption of accurate extraction provides a simple starting point for theory evaluation via computational cognitive modeling, in the absence of a particular theory about how children may inaccurately extract information.

So, with this information extracted from the inputFootnote ², children would then create verb classes by using Bayesian inference, a type of probabilistic learning shown to accord with a variety of developmental patterns across cognition (see Pearl, Reference Pearl2021 for a brief review). When using Bayesian inference, a learner updates hypotheses by balancing prior knowledge or biases against fit to the observed data. For learning verb classes, Pearl and Sprouse (Reference Pearl and Sprouse2019) built in a standard type of prior knowledge for learning classes of any kind, which is that fewer classes are preferred. The fit to the observed data is about the child’s input: here, if the modeled child assumes a certain set of verb classes, is the information observed in the input about argument animacy, syntactic context, and link distribution more probable? A verb class hypothesis that causes the observed information to be more probable is a better fit than a hypothesis that causes the observed information to be less probable.

To better understand this idea of a hypothesis fitting the observed data, consider two verb class hypotheses involving seem and appear. The first hypothesis $ {H}_1 $ puts each verb in its own verb class ( $ {H}_1 $ : $ {class}_1 $ = {appear}, $ {class}_2 $ = {seem}); the second hypothesis $ {H}_2 $ puts both verbs together into one verb class ( $ {H}_2 $ : $ {class}_1 $ = {appear, seem}). Suppose the observed information the modeled child learns from comes from this utterance: Lindy appeared to be sad, but then she seemed to be happy.

In this utterance, the information from argument animacy, syntactic contexts, and link distributions is the same for appear and seem. Hypothesis $ {H}_1 $ , which separates these verbs into different verb classes, views this similarity as a coincidence – similar verb behavior is not expected if verbs are in different classes. In contrast, hypothesis $ {H}_2 $ , which puts these verbs into the same verb class, expects this similarity in verb behavior precisely because the verbs are in the same verb class. When a hypothesis’s expectations are met, it will find the observed information to be more probable and therefore be a better fit. So, $ {H}_2 $ will find the observed information to be more probable, and a modeled learner relying on Bayesian inference will prefer $ {H}_2 $ over $ {H}_1 $ as a better fit for the observed information.

Information integrated

The acquisition theory implemented in the model involves integrating several types of information: (i) animacy (non-linguistic), (ii) syntactic contexts (syntactic), and (iii) links between thematic roles (semantic) and syntactic positions (syntactic). These information sources are combined using the non-linguistic learning mechanism of Bayesian inference.

Model input

To generate predictions about verb classes that English-learning children would have, the model learned from verb uses in English child-directed speech samples. Pearl and Sprouse estimated how many verb uses children at different ages (three, four, and five) would encounter, and implemented models that learned from these same quantities. So, for instance, the three-year-old modeled child learned from the amount of verb uses a three-year-old English-learning child would encounter, distributed according to the samples of speech directed to English-learning children up to age three.

Model output and evaluation

To evaluate a modeled child, Pearl and Sprouse compared the verb classes predicted by the modeled child against verb classes that children of the appropriate age seem to have. More specifically, Pearl and Sprouse used 12 types of syntactic or interpretation behavior surveyed from a large collection of child behavioral studies in order to identify verb classes that three-, four-, and five-year-old English children have. These behaviors included subject-raising, unaccusative, and passivizable, among others. From these verb behaviors at ages three to five, Pearl and Sprouse derived age-specific verb classes that a modeled child should attempt to match when it learns from the same data that three-, four-, or five-year-olds learn from. In particular, verbs in the same class are treated the same by children of that age (i.e., the verbs either have or do not have a specific syntactic or interpretation behavior, such as being passivizable). So, the modeled child of that age should cluster those verbs together if it has learned the way children of that age learn.

Pearl and Sprouse found that their modeled three-, four-, and five-year-olds were able to generate verb classes that matched English-learning children’s verb classes fairly well.

What we learned

The model’s success at matching available empirical data from children supports the acquisition theory implemented in the model, and suggests that children may indeed be learning from these different information types when developing the linking theory knowledge that leads to their observable verb classes. More specifically, the way English-learning children cluster verbs together during syntactic acquisition aligns with them learning not just from syntactic information (e.g., syntactic contexts), but also from non-syntactic information (e.g., animacy and thematic roles).