Hostname: page-component-586b7cd67f-vdxz6 Total loading time: 0 Render date: 2024-11-27T22:24:07.232Z Has data issue: false hasContentIssue false

A Lot of Data

Published online by Cambridge University Press:  01 January 2022

Abstract

This article encourages the use of explicit methods in linguistics by attempting to estimate the size of a linguistic data set. Such estimations are difficult because redundant data can easily pad the data set. To address this, I offer some explicit operationalizations of the data and their features. For linguistic data, negative associations do not indicate true redundancy, and yet for many measures they can be mathematically impossible to ignore. It is proven that this troublesome phenomenon has positive Lebesgue measure and is monotonically increasing and that these two features hold robustly in four different ways.

Type
Research Article
Copyright
Copyright © The Philosophy of Science Association

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Chomsky, N. 1986. Knowledge of Language. Westport, CT: Praeger.Google ScholarPubMed
Dawes, R. 1979. “The Robust Beauty of Improper Linear Models in Decision Making.” American Psychologist 34:571–82.CrossRefGoogle Scholar
Glymour, C. 1980. Theory and Evidence. Princeton, NJ: Princeton University Press.Google Scholar
Horn, R. A., and Johnson, C. R.. 1985. Matrix Analysis. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Johnson, K. 2009. “The Need for Explicit Inferential Methods in Linguistics.” In Language and Linguistics Emerging Trends, ed. Dreyer, C. R., 193208. New York: Nova.Google Scholar
Jolliffe, I. 2010. Principal Component Analysis. 2nd ed. New York: Springer.Google Scholar
Landau, I. 2000. Elements of Control: Structure and Meaning in Infinitival Constructions. Dordrecht: Kluwer.CrossRefGoogle Scholar