Hostname: page-component-586b7cd67f-2plfb Total loading time: 0 Render date: 2024-11-27T11:07:24.892Z Has data issue: false hasContentIssue false

Model-theoretic syntax for the working syntactician

Published online by Cambridge University Press:  03 October 2024

Mark Steedman*
Affiliation:
University of Edinburgh, EH8 9AB, United Kingdom
Rights & Permissions [Opens in a new window]

Abstract

To bring linguistic theory back in touch with commonplace observations concerning the resilience of language in use to language change, language acquisition and ungrammaticality, Pullum and colleagues have argued for a ‘model-theoretic’ theory of syntax. The present paper examines the implications for linguists working in standard formal frameworks and argues that, to the extent that such theories embrace monotonicity in syntactic operations, they qualify as model-theoretic under some minor modifications to allow for the possibility of unknown words.

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press

1. Introduction

Pullum & Scholz (Reference Pullum and Scholz2003), Rogers (Reference Rogers2003) and Pullum (Reference Pullum, Rogers and Kepser2007, 2013a, 2020) draw a distinction between theories of syntax that are Generative-Enumerative (GES) and those that are Model-Theoretic (MTS). The former category includes classical Transformational Grammar (TG, the variously revised, extended, ‘standard’ theory, Chomsky Reference Chomsky1965, Reference Chomsky, Culicover, Wasow and Akmajian1977, Reference Chomsky1981), most proposals under the ‘Minimalist Program’ (Chomsky Reference Chomsky1995), Optimality Theoretic Syntax (OTS, Legendre et al. Reference Legendre, Grimshaw and Vikner2001), Tree-Adjoining Grammar (TAG, Joshi & Schabes Reference Joshi, Schabes, Nivat and Podelski1992) and Combinatory Categorial Grammar (CCG, Steedman Reference Steedman2000). The MTS category includes (with some important caveats (Pullum Reference Pullum2013a: 501)) the following: Lexical-Functional Grammar (LFG, Bresnan Reference Bresnan1982), Head-Driven Phrase-Structure Grammar (HPSG, Pollard & Sag Reference Pollard and Sag1994), and at least the related ‘sign-based’ version of Construction Grammar (CxG, Boas & Sag Reference Boas and Sag2012), all of which are constraint-based (Pollard Reference Pollard1996). Pullum further argues that only the MTS theories can explain the fact that adults and children acquiring their first language can cope with such features of the input as neologism and gradient grammaticality in language use.

The present paper endorses the core MTS claim as consistent with the related principle of monotonicity in rules embraced by many GES approaches including CCG, TAG and (rather late in the day) Chomskyan Minimalism, as well as by constraint-grammars, which says that structure, once built, cannot be modified. It is an empirical question whether grammars of this constrained kind are more adequate linguistically when expressed in terms of constraints or generative rules. However, it does not follow that they differ with respect to the problems of unseen vocabulary or gradient grammaticality. Both can be handled with mechanisms that are quite standardly used by computational linguists to manage the profligate ambiguity of all natural languages.

2. Why Model Theory?

Model theory is most familiar as the device that connects the sentences of a logic to truth value, T/F, relative to a structure, or model M. For example, the models for first-order predicate logic (FOPL) are structures of (unary, binary, etc.) predicates holding over entities, such as farmer(Giles), donkey(Modestine), walks(Modestine) and owns(Giles, Modestine). The model theory is a small collection of axioms closely related to the syntax of the logic – for example, one that assigns sentences of the syntactic form ‘P & Q’ (where P and Q are themselves sentences of the logic) the value T in the model just in case both P and Q have the value T in the model, and F otherwise.

Montague (Reference Montague, Hintikka, Moravcsik and Suppes1973) showed that the semantics of natural languages like English could be defined model-theoretically in similar terms over similar structures. (To the working linguist who pays any attention to semantics, this is probably the most familiar use of model theory.)

More radically, Lewis (Reference Lewis1970), Geach (Reference Geach1970) and Montague (Reference Montague and Visentini1970) suggested that the syntax of natural language, though more complex than that of FOPL, could be analyzed as actually isomorphic to a model-theoretic semantics. This suggestion was taken to varying degrees of literality by linguists such as Partee (Reference Partee1975), Bach (Reference Bach1979), Keenan & Faltz (Reference Keenan and Faltz1985), Jacobson (Reference Jacobson, Huck and Ojeda1987), Szabolcsi (Reference Szabolcsi and Szabolcsi1997) and the present author.

In apparent contrast, Chomsky has always insisted on the autonomy of syntax and defines the business of the syntactician as understanding the relation between the strings of the language(s) and sets of syntactic structures independently accessible to the intuition of native speakers, under the doctrine of the ‘autonomy of syntax’ from semantics and processing (Chomsky Reference Chomsky1957, passim).Footnote 1

Model theories can be defined for other logics over other models. In particular, Pullum (Reference Pullum2013a) follows Rogers (Reference Rogers1998, Reference Rogers2003) in taking the tree-structures that a Chomskyan would view as associated a priori with the sentences of natural languages as models, rather than (or perhaps, as well as) structures that are interpretable in their own right in the Montagovian sense. For Pullum and Rogers, a sentence is grammatical if its tree satisfies (‘models’) a grammar defined as an unordered set of formulæ expressed in the language of weak Monadic Second-Order Logic (wMSO). A second-order logic is one in which you can quantify over predicates, as well as over individuals. It is monadic if second-order quantification is limited to unary predicates (equivalently sets or properties), such as donkey, farmer and walks. It is weak if the sets are finite.

WMSO logic is interesting because, unlike full second-order logic, it supports useful models including graphs (Hedman Reference Hedman2004: 388–392). As a consequence, it has many applications in the theory of programming languages (Libkin Reference Libkin2004). For Pullum and Rogers, it acts as a metatheory for the theory of natural languages in terms of their treesets.

It is perhaps not surprising that the treesets constituting natural language should be second-order. The trees reflect the way in which the language puts meaning representations together and are clearly second-order as languages in that sense. In particular, raising and control verbs, as well as generalized quantifiers and relative pronouns, are second-order functions specifying first-order properties (but no higher) as arguments (Chierchia Reference Chierchia1985, Chierchia & Turner Reference Chierchia and Turner1988). For example, ‘seem’ can be argued to denote a second-order predicate $ \lambda p\lambda y. seem\left(p,y\right) $ that combines to its right with a first-order predicate, such as ‘to be drunk’, and a subject to its left, such as ‘they’, to yield a sentence ‘They seem to be drunk’, with a meaning writtenseem(drunk,them). It is this level that defines the derivational treeset.

It is possible to regard the grammars of such languages as a logic in its own right, as Categorial grammarians of the ‘Type-Logical’ persuasion (Moortgat Reference Moortgat1988, Morrill Reference Morrill1994, Kubota & Levine Reference Kubota and Levine2020) do. This is, in fact, the view in other explicitly model-theoretic syntactic frameworks such as those of Michaelis et al. (Reference Michaelis, Mönnich, Morawietz, Kamp, Rossdeutscher and Rohrer2001), Mönnich (Reference Mönnich, Rogers and Kepser2007) and Graf (Reference Graf2010), who apply model-theoretic approaches to Chomskyan Minimalism, among other GES formalisms. However, such logics are tree-logics, and their semantics is ‘tree-conditional’, specified in terms of membership of the treeset of the language, rather than truth conditional. For the most part, as in the above case, the meaning representations that natural language derivations put together can be translated as expressions in a different, first-order, truth-conditional language, with a first-order model theory, chunks of which the derivation merely glues together.

Nevertheless, sentences like the following seem both syntactically and logically to involve second-order quantification over monadic properties $ p $ :

If so, the connection between MTS and model theoretic semantics is made.Footnote 2

However, Pullum further argues that, in order to reconnect with the nature of human language in use – and, in particular, its acquisition by children and its related resilience in adult speakers to gradient grammaticality, neologism and error in the input – MTS grammars should be defined as sets of constraints on the trees of the language, rather than as productions, as is standard in GES (Pullum Reference Pullum2013b).

This paper agrees that explanatory theories of syntax should be model-theoretically formulated if they are to follow Pullum’s call to re-engage with the psychological questions of the resilience of language in use, and of evolutionarily and cognitively realistic conditions on its acquisition, but argues that they need not be constraint-based in order to achieve those ends, widening the range of syntactic theories that should be recognized as model-theoretic.

3. MTS as Metatheory

Rogers (Reference Rogers2003) shows that wMSO logics can be used to define a logic-based equivalent of the infinite hierarchy of increasingly expressive ‘Abstract Families of Languages’ identified by Weir (Reference Weir1988, Reference Weir1992) including the well-known context-free languages (CFL), ‘Type 2’ in the original hierarchy of Chomsky (Reference Chomsky1956, passim). These systems, with CCG and TAG languages are properly included in the lowest trans-context-free level among the multi-level hierarchy of sub-context-sensitive Linear Context-Free Rewriting Systems (LCFRS, Vijay-Shanker et al. Reference Vijay-Shanker, Weir and Joshi1987) and the essentially equivalent Multiple Context Free Grammars (MCFG, Seki et al. Reference Seki, Matsumura, Fujii and Kasami1991). These lowest levels are called LCFRS-2 or 2-MCFG. By contrast, the languages of $ \mathcal{MG} $ , the version of Chomskyan Minimalist Grammars identified by Stabler (Reference Stabler and Retoré1997), reside at the highest level of full LCFRS/MCFG (Michaelis Reference Michaelis, de Groote, Morrill and Retoré2001), but all are much less expressive than the context-sensitive and recursively-enumerable languages (respectively identified as ‘Type 1’ and ‘Type 0’ in the original Chomsky hierarchy). Rogers generalizes results of Thatcher (Reference Thatcher1967) and Doner (Reference Doner1970) to show that this hierarchy corresponds to a series of wMSO logics (Seki et al. Reference Seki, Matsumura, Fujii and Kasami1991), where level $ n $ corresponds to a wMSO defined over $ n $ linear precedence relations $ {<}_1 $ to $ {<}_n $ , corresponding to the $ n $ dimensions of LCFRS trees, and where the set of level $ n $ languages properly includes those of all levels $ <n $ .

The definition of the language hierarchy in terms of wMSO logic, in addition to the standard formalisms of productions and automata, is an exciting and elegant result, which perhaps should have been expected on the basis of the Curry-Howard-Lambek equivalence between logical proof, computation, production systems and category theory (Chomsky Reference Chomsky1956, Mac Lane Reference Mac Lane1971, Lambek Reference Lambek1968).Footnote 3

Pullum gives the following correspondence between abstract families of languages (AFL) and logics in the hierarchy (adapted from Reference Pullum2013a: 500):Footnote 4

A number of points are of interest concerning this hierarchy. First, membership at any level carries a guarantee of polynomial recognition/parsability (Vijay-Shanker & Weir Reference Vijay-Shanker and Weir1994), and hence of the applicability of practical divide-and-conquer parsing algorithms. Second, it affords a partial ranking of linguistic theories in terms of their expressive power.

As Chomsky notes (Reference Chomsky1965: 31, 62), explanatory adequacy of a descriptively adequate theory of grammar is dependent in part on the degree to which it restricts the space of possible grammars to allow only possible human languages.Footnote 5

The expressive power of a theory is directly relevant to this question. CFG (and therefore GPSG, GB under the assumptions of Relativized Minimality (Rizzi Reference Rizzi1990) and Manzini’s (Reference Manzini1992) lexicalized theory of locality (Rogers Reference Rogers1998: 185), and presumably some versions of Minimalism with the related Minimal Link/Shortest Move Condition) cannot capture the non-nested dependencies that are known to characterize some natural language constructions (Shieber Reference Shieber1985) and are therefore descriptively inadequate. Provided that CCG or TAG can be shown to be descriptively adequate to capture the full variety of constructions allowed by human language, then whatever the differences between them, they may immediately be held to be more explanatory than formalisms that reside further up the hierarchy, let alone those at the level of the Type 1 or Type 0 grammars that lurk far outside it. That is to say that, under any of the three views, the hierarchy provides what Chomsky (Reference Chomsky1965: §1.7) called an evaluation measure. For example, if a descriptively adequate account of English grammar at level 3 can be devised that includes the interaction of raising verbs with relativization that Rogers (Reference Rogers2004) analyzes in terms of a level 4-wMSO grammar under TAG assumptions, then it may be preferred.Footnote 6

One might at this point ask what we might be excluding by confining ourselves to grammars that map onto logics that happen to have graphs as models. What kind of grammar corresponds to a logic that is not limited in this way? One answer might be the following: those grammars that include non-monotonic operations that alter structure, such as movement of the kind embraced by early Transformational Theories of the 1970s (Bresnan Reference Bresnan, Culicover, Wasow and Akmajian1977). Such grammars do not seem to correspond to any wMSO logic and would appear to require resource-sensitive ‘Logics of Change’ such as Linear or Dynamic Logic.Footnote 7

Linear Logic (Girard Reference Girard1987), which is well understood in terms of proof-theory, seems reluctant to yield a useful model-theoretic interpretation of the kind available for monotonic logics (cf. Girard Reference Girard, Girard, Lafont and Regnier1995). If so, one interpretation of the implication of MTS for the working syntactician might be the following: Avoid non-monotonicity in rules. A stronger one might be the following: Make sure you are weakly equivalent in generative capacity to some level within the hierarchy of productions/automata/wMSO logics, all of which are inherently monotonic. The latter is, in fact, the definition advocated in the conclusion below. However, Pullum et al. propose a stricter definition, discussed next.

4. MTS vs. Generative-Enumerative Syntax

Rogers’ observation that the levels of the Weir language hierarchy can be mapped to the levels of a hierarchy of sMSO logics, all of which can be assigned a transparent and intuitive model theory, might seem to suggest that the model theoretic syntactic theories should be defined by this hierarchy, whether viewed in terms of production systems, automata or wMSO logic. As the table in (2) indicates, that would make virtually all formally explicit theories of syntax count as card-carrying MTS, including some versions of GB and Chomskyan Minimalism.Footnote 8

However, Rogers defines grammars as sets of wMSO constraints on treesets, which he shows can then be algorithmically converted to more standard grammars in formalisms like GB or TAG. In the latter case, this conversion amounts to generating the entire set of elementary trees that define a language-specific TAG grammar, of the kind developed by hand under the X-TAG project (Bangalore et al. Reference Bangalore, Sarkar, Doran and Hockey1998). The constraint sets can, in principle, be orders of magnitude smaller than the grammars they map onto since many of the constraints may be universal to all grammars, and relatively few be language-specific, such as that this is a VSO language. In principle, the wMSO specification could be a very efficient way of running the large grammars and treesets that are currently maintained for wide-coverage grammars like CCGbank or XTAG, which typically require labor-intensive and time-consuming global changes if the grammar changes.

Pullum (Reference Pullum2013a: 497–498) follows Rogers in placing a further condition for theories of syntax to be accredited as fully model-theoretic – namely, that systems of rules should be replaced by unordered sets of constraints that their tree-structures have to satisfy to count as well-formed. This condition supposedly distinguishes them from the Generative-Enumerative theories. This remains, notwithstanding, a deliberately Chomskyan view of the problem of grammar, according to which the tree corresponding to each sentence of the language is as much of a given as the word-string, with no theoretical role for the semantics that actually determines its form and, ultimately, these constraints. It is the job of the linguist to determine the constraints – a.k.a. significant generalizations concerning the form of such trees – in the style of ‘X-bar’ theory and ‘Relativized Minimality’.

Rogers (Reference Rogers2003) gives a worked example of the partial specification of a fragment of TAG grammar expressed via wMSO constraints. We will instead use Pullum’s (Reference Pullum2020: 7) simpler (first-order) example of a constraint here, in the form of the following formula that can be paraphrased as ‘Every PP node immediately dominates a P node which is its head’:Footnote 9

He reasonably suggests that this formula constitutes a substantive universal constraint on all natural language treesets, which can be further specified for languages like English by a parochial linear precedence rule, saying that if a P node has sister(s), then it precedes them.

Pullum points out that such constraints on trees are not equivalent to GES rules like the following related CFPS production or categorial grammar (CG) category (4a,b):

In particular, while the production (4a) captures the fact that one way of realizing a PP is as immediately dominating P followed by NP, and while the CG category (4b) captures, in addition, the fact that the P(reposition) ‘to’ is the head of PP (since the latter corresponds to its result-type), the constraint (3) says, by contrast, that all PPs have a (less specified) realization.

One might think that the choice between constraint-based and rule-based grammars would be an empirical one, depending on which works best in practice for the grammar or module of grammar to hand. The intrinsic algorithmic advantage of Rogers’ own tree-automaton-related constraints is, as he points out (2003: 293, 318–319), hard to realize in practice. The very abstract constraints that are required, which probably have to be identified top-down, from the most universal to the more parochial, do not seem easily compatible with the way linguists usually work, from particular observations to inductive generalization, (and in practice, usually onward to exceptions to the generalization that will lead to its modification).

Nevertheless, Pullum makes constraint-based formalism part of the definition of MTS, as being crucial to issues of language use, such as neologism, error and gradient grammaticality, which have been relegated by GES linguistics to the purgatory of Performance, under an assumption of autonomy of competence grammar.

The idea of Competence Grammar is to some extent justified by the fact that there are many processing algorithms and learning mechanisms for even the simplest grammars. However, total neglect of Performance ignores the obvious fact that, however language developed in humans, grammar and processor must have come into existence as a package deal, for the simple reason that the one is of no use without the other. To that extent, it would at least be prudent for linguists to keep their theories compatible with those algorithms and mechanisms, for which the minimum requirement is recognition time polynomial in the length of the string, a property guaranteed by the wMSO hierarchy.Footnote 10

But Pullum’s criticisms of GES go considerably further in identifying a number of other characteristics of performance mechanisms that should act as constraints on linguistic theorizing. These criticisms are of three kinds.

The first is that generativists have been extremely careless over the years in their use of the term ‘infinite’ (Pullum & Scholz Reference Pullum, Scholz and Van der Hulst2010). They have frequently claimed that infinitude is an intrinsic property of the sets of sentences that constitute human languages, when what they mean is that natural languages are unbounded sets (nor is unboundedness a distinctive property of human languages as is often claimed, since some quite trivial finite-state (type 3) languages are also unbounded). Unboundedness is a completely unsurprising property of any reasonable theory of natural grammar, rather than distinguishing human language. I think Pullum is quite right on this score, but I do not think it distinguishes MTS from GES.

A second kind of criticism, focused on ‘holism’ in grammar and in language acquisition (Pullum Reference Pullum2020), seems to amount to an argument against non-monotonicity. Non-monotonicity means that you cannot determine whether a structure is well-formed according to the grammar by purely local application of either rules or constraints. Similarly, if during language acquisition, a datum – that is, a situation pairing a meaning representation with a string – fails to yield an analysis according to the grammar $ \mathcal{G} $ that the child or the linguist has induced so far, then monotonicity in the universal set of rules/constraints that it has to choose from may make it difficult to make a modular change to a more adequate grammar $ {\mathcal{G}}^{\prime } $ . Culicover & Wexler (Reference Culicover, Wexler, Culicover, Wasow and Akmajian1977), Wexler & Culicover (Reference Wexler and Culicover1980) and Berwick (Reference Berwick1985) had to invoke various ‘Freezing’ and ‘Subset’ Principles to constrain the search space in the face of non-monotonicity, causing Baker (Reference Baker, Culicover, Wasow and Akmajian1977, Reference Baker1979) to advocate the entire elimination of non-monotonic rules from the theory of grammar, consistent with the wMSO logic-based language hierarchy, since monotonicity is as noted earlier, guaranteed by wMSO logic. However, we have also noted that most modern theories of syntax at least pay lip service to monotonicity, as under various ‘Projection Principles’ and the ‘Inclusiveness Condition’ of Chomsky (Reference Chomsky1995, passim), suggesting that this property may also be attainable under the GES approach.

The third and most telling group of arguments made against GES relate to its all-or-none rigidity with respect to phenomena like gradient grammaticality, ‘quandaries’ (where there is a meaning for which the grammar fails to provide any fully grammatical realization) and the like. I will give some examples below, but they all come down the claim that it is part of the definition of a GES that it incorporates a closed lexicon.

Grammatical gradience has always been recognized by linguists and has usually been accounted for in terms of the number and significance of rule violations incurred, or by reordering constraints (Keller Reference Keller1998, Reference Keller2001). But it is standard when trying to prove mathematical properties such as closure under intersection for languages and grammars to close the grammar under finite (though unbounded) sets of rules and/or lexical items, and talking about it as generating ‘all and only’ the sentences of the relevant language(s). Similarly, when talking about the acquisition problem in mathematical terms, authors such as Chomsky (Reference Chomsky1965) have on occasion talked about language learning as the problem of instantaneously identifying a unique grammar, perhaps defined by settings of a finite set of parameters, on the basis of a finite sample of strings of the language alone. On occasion, this model has been taken rather literally as a model of acquisition – for example, by Gold (Reference Gold1967) and Yang (Reference Yang2002).

However, it does not seem necessary to think of an adult or a child who learns a new word that the context and/or rest of the sentence allows them to understand as bearing a known category such as that of a transitive verb – even if they do not know what it means, as when reading ‘Jabberwocky’ – as having changed their grammar. It seems more natural to assume that natural grammars in actual use include a ‘wild card’ for unknown words, rather like the ‘*’ matching any sequence of characters in a regular expression (RE), or perhaps some more phono/morpho-tactically-specific learned RE. The wild card simply allows matching unknown words to be treated generatively, as lexically ambiguous over all preterminal labels (or perhaps just open-class preterminals). This process is fundamental to the account discussed below of language acquisition by the child, to whom all words are initially unknown, but who has access to (noisy, ambiguous) contextually supported meanings that can be associated with originally unknown words. Seen in this light, Pullum’s problem of the Open Lexicon and the problem of Child Language Acquisition both reduce to the problem of ambiguity resolution.

Of course, backing off to lexical wild cards as a last resort for unknown words allows some additional categorial ambiguity into the grammar. But there is already a massive amount of ambiguity in every natural language – of a kind we never allow in the artificial languages of mathematics, logic and computer programming – so a little more lexical ambiguity can hardly matter. In particular, it does not change the wMSO tree-language level of the grammar.

The question of exactly how all that proliferating syntactic ambiguity in the grammar is actually resolved is of course an important question for practical parsers. For wide-coverage computational parsers, ambiguity has to be resolved using a statistical model, usually estimated from a treebank or corpus of sentences and associated tree structures that is representative of the language in actual use, or possibly by learning a direct end-to-end transducer between strings and meaning representations. In human sentence processing, the same function is performed by some combination of distributional models at all levels, including semantics and inference about context (Altmann & Steedman Reference Altmann and Steedman1988).

It is the parsing model that does the work of limiting algorithmic search among what for realistic cases are routinely thousands and sometimes millions of alternative syntactically legal parses. Such models also work very well in guessing which of a finite number of preterminal categories is most likely for the wild card on each occasion an unknown word is encountered.Footnote 11

Statistical parsing models also provide a basis for dealing with gradient grammaticality, since actual wide-coverage parsers always give multiple analyses (including ungrammatical ones), ranking them according to their statistical similarity to the training data. For the same reason, they will usually accept alternative versions of quandaries like ‘?his and my book’ vs. ‘??him and me’s book’, for which there is no single correct realization, since both are likely to be similar in varying degrees to examples that have occurred in the training data somewhere.

As Pullum (Reference Pullum2020) points out, a statistical parsing model of this kind is central to Abend et al.’s (Reference Abend, Kwiatkowski, Smith, Goldwater and Steedman2017) CCG-based model of language acquisition, which learns a variational Bayesian model of all possible lexical categories and all possible instantiations of a few universal syntactic rules that it has ever encountered in a single incremental pass through a corpus of transcribed child-directed utterances (CDU) paired with logical forms (including irrelevant distractors) that are unaligned with words, under the regime known as ‘semantic bootstrapping’. For example, let us assume that the first CDU that the child pays attention to is ‘Nice doggies!’, meaning nice dogs, and is uttered in a situation in which, among other things, there are dogs which (the adult surmises) the child likes, and the child has access to that meaning (or whatever corresponds to it in the child’s language of mind).Footnote 12

On the assumption that the child can analyze that meaning, using a universal rule of function application, as made up of the universal predicate nice applied to the entity dogs, they still cannot immediately know whether ‘nice’ is an adjectival predicate of type $ N/N $ meaning nice or a nominal of type $ N $ meaning dogs, or part of some other contextually salient but irrelevant meaning. It will therefore consider multiple equiprobable lexical possibilities for the word ‘nice’, each pairing a different syntactic type with a different semantic concept. (‘Doggies’, of course, is similarly ambiguous between $ N: dogs $ and $ N\hskip0.3em \backslash \hskip0.3em N: nice $ .) However, further exposure to paired sentences and contextually supported meanings will, even in the presence of noise and other distractions, offer many more occasions on which ‘doggies’ could be an $ N $ meaning dogs and ‘nice’ could be $ N/N $ meaning nice than ones on which they could be anything else. The probability mass associated with the wrong initial hypothetical lexical entries will accordingly rapidly be lowered in contrast to that of the correct ones. The same will apply to more complex examples involving syntactic discontinuities, such as ‘What you want is the doggie’, although the meaning representation will be correspondingly more complex, such as $ \lambda x. want\hskip0.5em x\hskip0.5em you\hskip0.5em \wedge \hskip0.5em doggie\hskip0.5em x $ , and the rules for breaking them down into phrases and lexical items will be more diverse, involving rules of movement, feature passing or function composition, according to whatever the child uses for Universal Grammar. To that extent, this mechanism must work cross-linguistically, although demonstrating that fact empirically may require a firmer grasp than we have at present on the linguistic semantics that underlies those universal principles.Footnote 13

More generally, those among the child’s hypothesized pairings of words with syntactic and semantic categories that correspond to the adult grammar will gain high cumulative probability mass because examples consistent with their involvement are frequently encountered. Others that do not correspond to the adult grammar lose probability mass in proportion because such evidence is increasingly rarely encountered. As the child’s probabilistic grammar approaches that of the adult, their certainty in assigning novel words to known categories on the basis of known context increases to the point of allowing them often to do so in their first exposure, in a process referred to as ‘one-trial learning’, which is also characteristic of Abend et al.’s learner in the later stages. The ability of adults to interpret sentences including novel words similarly shows that they are all still language learners in exactly the same sense.

The growing fragment of English grammar (or whatever language is being learned) that a linguist would identify as ‘the grammar’ only exists in such systems in a distributional sense. That is to say that the meaning of an English transitive verb is associated with a VO syntactic category with overwhelmingly high probability, but not as high as 1, or certainty. The CCG theory of grammar that constrains it and gives it form is generative and lexicalized, and resides firmly at level 3 of the extended hierarchy of languages defined by wMSO logic, LCFRS and recursion theory. However, such a system does not specify a set of ‘all and only’ the strings of the language it generates because of the generative lexical wild card. That stringset does not have the sharp boundaries – much less the closed lexicon and holistic acquisition properties – that Pullum (Reference Pullum2020: 5) identifies in many current GES theories.

Seen in the light of such systems, Pullum and colleagues’ MTS requirement can be seen as a call for linguists to bring their theories back into line with the requirements of the psychological theory of language acquisition and natural language processing, which are for syntactic-semantic type transparency, for a continuum of grammaticality, and for an open and incrementally-learnable lexicon.

To reach this conclusion is not of course to say that syntacticians should themselves start working on sentence processing and language acquisition, but rather that in order to be true to the nature of language itself, they need to give those who do work on those problems theories of grammar that they can work with. My own suggestion for such a theory is, of course, CCG. But other alternatives are possible, as long as they are similarly model-theoretic syntactically.

5. Conclusion

Model-theoretic syntax is a good idea. Its elegant formulation in terms of a hierarchy of weak monadic second-order logics over trees of successively greater dimensionality – completing the unity of the language hierarchy across phrase-structure productions, automata and logic – suggests a deeper type-transparent connection of syntactic structure to the model-theoretic semantics of natural language itself, and a possible explanation for why universal grammar should be constrained in this way.

It may well also turn out that constraint-based grammars have an empirical edge for some linguistic purposes, as linguists in HPSG (Ginzburg & Sag Reference Ginzburg and Sag2000: 2) and LFG (Bresnan Reference Bresnan2001: vii) have always argued for syntax, and as (as a reviewer points out) may well be the case for morphology and the lexicon. However, it is less clear that a constraint basis for the theory of grammar is a necessary requirement for natural language or that it follows from anything of any linguistic relevance in the definition of MTS.

The part of that definition that is of the utmost importance is that syntactic formalisms, whether constraint-based or generative, should live at some level – the less expressive, the better – of that extended hierarchy of abstract families of languages/grammars. For the working syntactician, model-theoretic syntax carries three attractions: First, it guarantees that the theory is monotonic and semantically type-transparent, reconnecting with the requirements of processing and acquisition. Second, it guarantees that the theory is explanatory, in the sense of excluding classes of phenomena that can only be captured by more expressive theories. Third, it affords a partial evaluation metric on descriptively adequate theories, whether they are expressed in terms of constraints or rules. It also offers a rapprochement in understanding with the psychologists and computer scientists, by many of whom linguistic theory has been abandoned since the early seventies.

Acknowledgements

Thanks to Geoff Pullum for tolerating innumerable questions and emails concerning MTS. Alex Koller, Alex Lascarides, Jim Rogers and Bonnie Webber gave logical advice. The responsibility for any misinterpretation remains with the author. Thanks also to the reviewers for suggesting many improvements to the paper. The project semantax has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 742137).

Competing interest

The author has no conflicts of interest to declare.

Footnotes

1 Chomsky’s autonomy of syntax doctrine can also be understood as an essentially methodological recommendation for how linguists should work in practice, given the opacity of formal semantics to introspection. Chomsky has never denied that the function of syntax is to support semantics and use.

2 It has been suggested to me by Alexander Koller, citing Stefan Thater, that (ia) involves non-monadic second-order quantification, over the binary predicate do something stupid, applying over pairs of entities, as in (ib).

However, it is significant that the car is expressed in a prepositional phrase and therefore can be argued to be an adjunct, in which case the predicate is monadic, and (ia) actually means something more like the following:

In support of this position, the following, in which the car would be syntactically marked as a true argument, is ungrammatical:

3 Rogers also identifies an effective procedure for transforming wMSO constraints into polynomial automata, although he points out that these are not automata that one would actually use for practical parsing and recognition.

4 The languages of linguistic formalisms such as TAG and CCG identified as exemplars of AFLs should be understood as proper sub-classes of that level. For example, Rogers & Pullum (Reference Rogers and Pullum2011) and Jäger & Rogers (Reference Jäger and Rogers2012) identify a number of subregular language families that are proper sub-classes of 2-wMSO, the regular languages.

5 Since we only have a sample of around 7,000 human languages, of which around 1,500 have been analyzed in any depth by linguists (Cinque Reference Cinque2013), we do not know for sure what that space is. Nevertheless, we are pretty sure that that space does not, for example, include permutation-complete languages like Bach’s (Reference Bach1981) artificial MIX language (see Steedman Reference Steedman2020), which is known to be both a non-CCG/Linear-Indexed Language (Kanazawa & Salvati Reference Kanazawa and Salvati2012) and in 2-MCFL/3-wMSOL (Salvati Reference Salvati2015).

6 Chomsky (Reference Chomsky1965: 45) regarded such an evaluation as impossible at the time and redefined explanatory adequacy in terms of relating grammatical assumptions to general principles governing child language acquisition; see below for discussion.

7 Interestingly, such logics have been associated with HPSG Søgaard & Lange (Reference Søgaard and Lange2009), although their use there is proof-theoretic, in support of logic-programming-based parsing/recognition, rather than model theoretic. The standard model theory for HPSG, developed over many years, is Relational Speciate Reentrant Logic (King Reference King and Kordoni1999, Richter Reference Richter2000, Reference Richter, Rogers and Kepser2007 (it has also been applied to LFG by Przepiórkowski Reference Przepiórkowski, Müller, abeille, Borsley and Koenig2021)). However, it is not clear to me where on the wMSO hierarchy (2) RSRL resides (Søgaard Reference Søgaard2009).

8 Since Chomsky (Reference Chomsky1981: 89), the work of moved elements such as wh-items and their traces has been done by ‘copies’, which are (via a mechanism which never seems to be made explicit) identical from the start of the derivation, in the sense that whenever one copy acquires a value, the other has it too: it is only the phonological interface that non-monotonically deletes (parts of) one and/or the other.

9 FOPL is a sub-logic of wMSO. I have slightly changed Pullum’s notation to make immediate dominance more explicit via the predicate id.

10 The point here is not the order of the polynomial itself, which is only a worst-case theoretical bound. Polynomial recognition guarantees the applicability of efficient ‘divide and conquer’ algorithms and statistical parsing models that mean the worst case can be avoided.

11 Prange et al. (Reference Prange, Schneider and Srikumar2021a, Reference Prange, Schneider and Srikumarb) show that we do not even need to have closed the set of preterminal categories, sometimes called ‘supertags’ in an obvious generalization of standard Part-of-Speech (PoS) tags.

12 Of course, the fact that the English words coincide with the identifiers chosen for universal semantic primitives in the example is just for the reader’s ease in understanding what is going on. The words are independent of the meanings, and could be in any language, while the meaning primitives themselves could be vector embeddings for all we know.

13 These are fairly well-understood for the example of wh-constructions used above, but a referee suggests the considerable cross-linguistic variation in comparative constructions as a case in point.

References

Abend, Omri, Kwiatkowski, Tom, Smith, Nathaniel, Goldwater, Sharon & Steedman, Mark. 2017. Bootstrapping language acquisition. Cognition. 164. 116143.CrossRefGoogle ScholarPubMed
Altmann, Gerry & Steedman, Mark. 1988. Interaction with context during human sentence processing. Cognition. 30. 191238.CrossRefGoogle ScholarPubMed
Bach, Emmon. 1979. Control in Montague Grammar. Linguistic Inquiry. 10. 513531.Google Scholar
Bach, Emmon. 1981. Discontinous constituents in generalized categorial grammar. In Proceedings of the 11th annual meeting of the Northeastern Linguistic Society, New York, 112.Google Scholar
Baker, Lee. 1977. Comments on the paper by Culicover and Wexler. In Culicover, Peter, Wasow, Thomas & Akmajian, Adrian (eds.), Formal syntax, 6170. New York: Academic Press.Google Scholar
Baker, Lee. 1979. Syntactic theory and the projection problem. Linguistic Inquiry. 10(4). 533581.Google Scholar
Bangalore, Srinivas, Sarkar, Anoop, Doran, Christine & Hockey, Beth Ann. 1998. Grammar and parser evaluation in the XTAG project. In Workshop on the evaluation of parsing systems.Google Scholar
Berwick, Robert. 1985. The acquisition of syntactic knowledge. Cambridge, MA: MIT Press.CrossRefGoogle Scholar
Boas, Hans & Sag, Ivan. 2012. Sign-based construction grammar. Stanford, CA: CSLI Publications.Google Scholar
Bresnan, Joan. 1977. Variables in the theory of transformations part i: Bounded versus unbounded transformations. In Culicover, Peter, Wasow, Thomas & Akmajian, Adrian (eds.), Formal syntax, 157196. New York: Academic Press.Google Scholar
Bresnan, Joan (ed.). 1982. The mental representation of grammatical relations. Cambridge, MA: MIT Press.Google Scholar
Bresnan, Joan. 2001. Lexical-functional syntax. Oxford: Blackwell.Google Scholar
Chierchia, Gennaro. 1985. Formal semantics and the grammar of predication. Linguistic Inquiry. 16. 417443.Google Scholar
Chierchia, Gennaro & Turner, Raymond. 1988. Semantics and property theory. Linguistics and Philosophy. 11. 261302.CrossRefGoogle Scholar
Chomsky, Noam. 1956. Three models for the description of language. IRE Transactions on Information Theory. 2. 113124.CrossRefGoogle Scholar
Chomsky, Noam. 1957. Syntactic structures. The Hague: Mouton.CrossRefGoogle Scholar
Chomsky, Noam. 1965. Aspects of the theory of syntax. Cambridge, MA: MIT Press.Google Scholar
Chomsky, Noam. 1977. On wh-movement. In Culicover, Peter, Wasow, Thomas & Akmajian, Adrian (eds.), Formal syntax, 71132. New York: Academic Press.Google Scholar
Chomsky, Noam. 1981. Lectures on government and binding. Dordrecht: Foris.Google Scholar
Chomsky, Noam. 1995. The minimalist program. Cambridge, MA: MIT Press.Google Scholar
Cinque, Guglielmo. 2013. On the movement account of Greenberg’s Universal 20: Refinements and replies: Materials. Ms., University of Venice.Google Scholar
Culicover, Peter & Wexler, Kenneth. 1977. Some syntactic implications of a theory of language learnability. In Culicover, Peter, Wasow, Thomas & Akmajian, Adrian (eds.), Formal syntax, 760. New York: Academic Press.Google Scholar
Davidson, Donald & Harman, Gilbert (eds.). 1972. Semantics of natural language. Dordrecht: Reidel.CrossRefGoogle Scholar
Doner, John. 1970. Tree acceptors and some of their applications. Journal of Computer and System Sciences. 4. 406451.CrossRefGoogle Scholar
Geach, Peter. 1970. A program for syntax. Synthèse. 22. 317. Reprinted as Davidson & Harman 1972: 483–497.CrossRefGoogle Scholar
Ginzburg, Jonathan & Sag, Ivan. 2000. Interrogative investigations. Stanford, CA: CSLI Publications.Google Scholar
Girard, Jean-Yves. 1987. Linear logic. Theoretical Computer Science. 50. 1102.CrossRefGoogle Scholar
Girard, Jean-Yves. 1995. Linear logic: Its syntax and semantics. In Girard, Jean-Yves, Lafont, Yves & Regnier, Laurent (eds.), Advances in linear logic (London Mathematical Society Lecture Notes 222), 142. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Gold, E. Mark. 1967. Language identification in the limit. Information and Control. 16. 447474.CrossRefGoogle Scholar
Graf, Thomas. 2010. Comparing incomparable frameworks: A model theoretic approach to phonology. University of Pennsylvania Working Papers in Linguistics. 16. 10.Google Scholar
Hedman, Shawn. 2004. A first course in logic: An introduction to model theory, proof theory, computability, and complexity. Oxford: Oxford University Press.CrossRefGoogle Scholar
Jacobson, Pauline. 1987. Phrase structure, grammatical relations, and discontinuous constituents. In Huck, Geoffrey & Ojeda, Almerindo (eds.), Syntax and semantics , vol. 20: Discontinuous constituency, 2769. Orlando, FL: Academic Press.Google Scholar
Jäger, Gerhard & Rogers, James. 2012. Formal language theory: Refining the Chomsky hierarchy. Philosophical Transactions of the Royal Society B: Biological Sciences. 367. 19561970.Google ScholarPubMed
Joshi, Aravind & Schabes, Yves. 1992. Tree-Adjoining Grammars and lexicalized grammars. In Nivat, Maurice & Podelski, Andreas (eds.), Definability and recognizability of sets of trees, 409431. Princeton, NJ: Elsevier.Google Scholar
Kanazawa, Makoto & Salvati, Sylvain. 2012. MIX is not a Tree-Adjoining Language. In Proceedings of the 50th annual meeting of the Association for Computational Linguistics (volume 1: Long papers), 666674. Jeju Island, Korea: ACL.Google Scholar
Keenan, Edward & Faltz, Leonard. 1985. Boolean semantics for natural language. Dordrecht: Reidel.Google Scholar
Keller, Frank. 1998. Gradient grammaticality as an effect of selective constraint re-ranking. In Papers from the 34th meeting of the Chicago Linguistic Society, vol. 2, 95109.Google Scholar
Keller, Frank. 2001. Gradience in grammar: Experimental and computational aspects of degrees of grammaticality. Chicago, IL: University of Edinburgh dissertation.Google Scholar
King, Paul. 1999. Towards truth in Head-driven Phrase Structure Grammar. In Kordoni, Valia (ed.), Tübingen studiesc in Head-driven Phrase Structure Grammar, vol. 2 (Arbeitspapiere des SFB 340 132), 301352. Eberhard Karls Universität Tübingen.Google Scholar
Kubota, Yusuke & Levine, Robert. 2020. Type-logical syntax. Cambridge, MA: MIT Press.CrossRefGoogle Scholar
Lambek, Joachim. 1968. Deductive systems and categories. Mathematical Systems Theory. 2. 287318.CrossRefGoogle Scholar
Legendre, Géraldine, Grimshaw, Jane & Vikner, Sven (eds.). 2001. Optimality-theoretic syntax. Cambridge, MA: MIT Press.CrossRefGoogle Scholar
Lewis, David. 1970. General semantics. Synthèse. 22. 1867. Reprinted as Ch.1 of Partee (1976).CrossRefGoogle Scholar
Libkin, Leonid. 2004. Elements of finite model theory. Berlin: Springer.CrossRefGoogle Scholar
Mac Lane, Saunders. 1971. Categories for the working mathematician. Berlin: Springer-Verlag.CrossRefGoogle Scholar
Manzini, Rita. 1992. Locality: A theory and some of its empirical consequences. Cambridge, MA: MIT Press.Google Scholar
Michaelis, Jens. 2001. Transforming linear context-free rewriting systems into minimalist grammars. In de Groote, Phillipe, Morrill, Glyn & Retoré, Christian (eds.), Logical aspects of computational linguistics (lacl’01) (Lecture Notes in Computer Science 2099), 228244. Berlin: Springer.CrossRefGoogle Scholar
Michaelis, Jens, Mönnich, Uwe & Morawietz, Frank. 2001. On minimalist attribute grammars and macro tree transducers. In Kamp, Hans, Rossdeutscher, Antje & Rohrer, Christian (eds.), Linguistic form and its computation, 287326. Stanford, CA: CSLI Publications.Google Scholar
Mönnich, Uwe. 2007. Minimalist syntax, multiple regular tree grammars, and direction-preserving tree transductions. In Rogers, James & Kepser, Stephan (eds.), Model-theoretic syntax at 10, chap. 10, 8387. Association for Logic Language and Information, http://www.folli.org.Google Scholar
Montague, Richard. 1970. English as a formal language. In Visentini, Bruno (ed.), Linguaggi nella società e nella technica, 189224. Milan: Edizioni di Communità. Reprinted as Thomason 1974: 188–221.Google Scholar
Montague, Richard. 1973. The proper treatment of quantification in ordinary English. In Hintikka, Jaakko, Moravcsik, Julius & Suppes, Patrick (eds.), Approaches to natural language: Proceedings of the 1970 stanford workshop on grammar and semantics, 221242. Dordrecht: Reidel. Reprinted in Thomason 1974: 247–279.CrossRefGoogle Scholar
Moortgat, Michael. 1988. Categorial investigations. Dordrecht: Universiteit van Amsterdam dissertation. Published by Foris, Dordrecht, 1989.CrossRefGoogle Scholar
Morrill, Glyn. 1994. Type-logical grammar. Dordrecht: Kluwer.CrossRefGoogle Scholar
Neeleman, Ad. 2013. Comments on pullum. Mind & Language. 28. 522531.CrossRefGoogle Scholar
Partee, Barbara. 1975. Montague Grammar and Transformational Grammar. Linguistic Inquiry. 6. 203300.Google Scholar
Partee, Barbara (ed.). 1976. Montague grammar. New York: Academic Press.Google Scholar
Pollard, Carl. 1996. The nature of constraint-based grammar. In Proceedings of the Pacific Asia conference on language, information, and computation. Stroudsberg, PA: Association for Computational Linguistics, 118.Google Scholar
Pollard, Carl & Sag, Ivan. 1994. Head-driven phrase structure grammar. Stanford, CA: CSLI Publications.Google Scholar
Prange, Jakob, Schneider, Nathan & Srikumar, Vivek. 2021a. CCG supertagging as top-down tree generation. In Proceedings of the 4th conference of the Society for Computation in Linguistics, 351354. https:/scholarworks.calstate.edu/.Google Scholar
Prange, Jakob, Schneider, Nathan & Srikumar, Vivek. 2021b. Supertagging the long tail with tree-structured decoding of complex categories. Transactions of the Association for Computational Linguistics. 9. 243260.CrossRefGoogle Scholar
Przepiórkowski, Adam. 2021. LFG and HPSG. In Müller, Stefan, abeille, Anne, Borsley, Robert, and Koenig, Jean-Pierre (eds.), The handbook of lexical functional grammar. Berlin: Language Science Press: 18611918.Google Scholar
Pullum, Geoffrey. 2007. The evolution of model-theoretic frameworks in linguistics. In Rogers, James & Kepser, Stephan (eds.), Model-theoretic syntax at 10. Association for Logic Language and Information, 110. http://www.folli.org.Google Scholar
Pullum, Geoffrey. 2013a. The central question in comparative syntactic metatheory. Mind & Language. 28. 492521.CrossRefGoogle Scholar
Pullum, Geoffrey. 2013b. Consigning phenomena to performance: A response to Neeleman. Mind & Language. 28. 532537.CrossRefGoogle Scholar
Pullum, Geoffrey. 2020. Theorizing about the syntax of human language: A radical alternative to generative formalisms. Cadernos de Linguística. 1. 133.CrossRefGoogle Scholar
Pullum, Geoffrey & Scholz, Barbara. 2003. Contrasting applications of logic in natural language syntactic description. In Proceedings of the 12th international congress of logic, methodology and philosophy of science, oviedo. London: KCL Publications.Google Scholar
Pullum, Geoffrey & Scholz, Barbara. 2010. Recursion and the infinitude claim. In Van der Hulst, Harry (ed.), Recursion and human language. Berlin: Mouton de Gruyter.Google Scholar
Richter, Frank. 2000. A mathematical formalism for linguistic theories with an application in Head-Driven Phrase Structure Grammar. Tubingen: Universität Tübingen dissertation.Google Scholar
Richter, Frank. 2007. Closer to the truth: A new model theory for HPSG. In Rogers, James & Kepser, Stephan (eds.), Model-theoretic syntax at, 10. 99108. Tubingen: Universitat Tubingen.Google Scholar
Rizzi, Luigi. 1990. Relativized minimality. Cambridge, MA: MIT Press.Google Scholar
Rogers, James. 1998. A descriptive approach to language-theoretic complexity. Stanford, CA: CSLI Publications.Google Scholar
Rogers, James. 2003. wMSO theories as grammar formalisms. Theoretical Computer Science. 293. 291320.CrossRefGoogle Scholar
Rogers, James. 2004. Wrapping of trees. In Proceedings of the 42nd annual meeting of the Association for Computational Linguistics, 558565.Google Scholar
Rogers, James & Pullum, Geoffrey. 2011. Aural pattern recognition experiments and the subregular hierarchy. Journal of Logic, Language and Information. 20. 329342.CrossRefGoogle Scholar
Salvati, Sylvain. 2015. MIX is a 2-MCFL and the word problem in $ {\mathrm{\mathbb{Z}}}^2 $ is captured by the IO and the OI hierarchies. Journal of Computer and System Sciences. 81. 12521277. Circulated in 2011.CrossRefGoogle Scholar
Seki, Hiroyuki, Matsumura, Takashi, Fujii, Mamoru & Kasami, Tadao. 1991. On multiple context-free grammars. Theoretical Computer Science. 88. 191229.CrossRefGoogle Scholar
Shieber, Stuart. 1985. Evidence against the context-freeness of natural language. Linguistics and Philosophy. 8. 333343.CrossRefGoogle Scholar
Søgaard, Anders. 2009. Verifying context-sensitive treebanks and heuristic parses in polynomial time. In Proceedings of the 17th Nordic conference of computational linguistics (nodalida), 190197.Google Scholar
Søgaard, Anders & Lange, Martin. 2009. Polyadic dynamic logics for HPSG parsing. Journal of Logic, Language and Information. 18. 159198.CrossRefGoogle Scholar
Stabler, Edward. 1997. Derivational Minimalism. In Retoré, Christian (ed.), Logical aspects of computational linguistics (lacl’96) (Lecture Notes in Computer Science), vol. 1328, 6895. New York: Springer.CrossRefGoogle Scholar
Steedman, Mark. 2000. The syntactic process. Cambridge, MA: MIT Press.CrossRefGoogle Scholar
Steedman, Mark. 2020. A formal universal of natural language grammar. Language. 96. 618660.CrossRefGoogle Scholar
Szabolcsi, Anna. 1997. Strategies for scope-taking. In Szabolcsi, Anna (ed.), Ways of scope-taking, 109154. Dordrecht: Kluwer.CrossRefGoogle Scholar
Thatcher, James. 1967. Characterizing derivation trees of context-free grammars through a generalization of finite automata theory. Journal of Computer and System Sciences. 1. 317322.CrossRefGoogle Scholar
Thomason, Richmond (ed.). 1974. Formal philosophy: Papers of Richard Montague. New Haven, CT: Yale University Press.Google Scholar
Vijay-Shanker, K. & Weir, David. 1994. The equivalence of four extensions of context-free grammar. Mathematical Systems Theory. 27. 511546.CrossRefGoogle Scholar
Vijay-Shanker, K., Weir, David & Joshi, Aravind. 1987. Characterizing structural descriptions produced by various grammatical formalisms. In Proceedings of the 25th annual meeting of the Association for Computational Linguistics, 104–11. ACL.CrossRefGoogle Scholar
Weir, David. 1988. Characterizing mildly context-sensitive grammar formalisms. Philadelphia: University of Pennsylvania dissertation. Published as Technical Report CIS-88-74.Google Scholar
Weir, David. 1992. A geometric hierarchy beyond context-free languages. Theoretical Computer Science. 104. 235261.CrossRefGoogle Scholar
Wexler, Kenneth & Culicover, Peter. 1980. Formal principles of language acquisition. Cambridge, MA: MIT Press.Google Scholar
Yang, Charles. 2002. Knowledge and learning in natural language. Oxford: Oxford University Press.Google Scholar