Book contents
- Frontmatter
- Contents
- Series editors' preface
- Acknowledgements
- 1 Introduction to a corpus in use
- 2 The corpus as object: Design and purpose
- 3 Methods in corpus linguistics: Interpreting concordance lines
- 4 Methods in corpus linguistics: Beyond the concordance line
- 5 Applications of corpora in applied linguistics
- 6 Corpora and language teaching: Issues of language description
- 7 Corpora and language teaching: General applications
- 8 Corpora and language teaching: Specific applications
- 9 An applied linguist looks at corpora
- List of relevant web-sites
- References
- Index
2 - The corpus as object: Design and purpose
Published online by Cambridge University Press: 05 October 2012
- Frontmatter
- Contents
- Series editors' preface
- Acknowledgements
- 1 Introduction to a corpus in use
- 2 The corpus as object: Design and purpose
- 3 Methods in corpus linguistics: Interpreting concordance lines
- 4 Methods in corpus linguistics: Beyond the concordance line
- 5 Applications of corpora in applied linguistics
- 6 Corpora and language teaching: Issues of language description
- 7 Corpora and language teaching: General applications
- 8 Corpora and language teaching: Specific applications
- 9 An applied linguist looks at corpora
- List of relevant web-sites
- References
- Index
Summary
As corpora have become larger and more diverse, and as they are more frequently used to make definitive statements about language, issues of how they are designed have become more important. Four aspects of corpus design are discussed in this chapter: size, content, representativeness and permanence. The chapter also summarises some types of corpus investigation, each of which treats the corpus as a different kind of object.
Issues in corpus design
Size
As computer technology has advanced since the 1960s, so it has become feasible to store and access corpora of ever-increasing size. Whereas the LOB Corpus and Brown Corpus seemed as big as anyone would ever want, at the time, nowadays 1 million words is fairly small in terms of corpora. The British National Corpus is 100 million words; the Bank of English is currently about 400 million. CANCODE is 5 million words. The feasible size of a corpus is not limited so much by the capacity of a computer to store it, as by the speed and efficiency of the access software. If, for example, counting the number of past and present tense forms of the verb BE in a given corpus takes longer than a few minutes, the researcher may prefer to use a smaller corpus whose results might be considered to be just as reliable but on which the software would work much more speedily.
- Type
- Chapter
- Information
- Corpora in Applied Linguistics , pp. 25 - 37Publisher: Cambridge University PressPrint publication year: 2002