Book contents
5 - Analyzing a corpus
Published online by Cambridge University Press: 03 December 2009
Summary
The process of analyzing a completed corpus is in many respects similar to the process of creating a corpus. Like the corpus compiler, the corpus analyst needs to consider such factors as whether the corpus to be analyzed is lengthy enough for the particular linguistic study being undertaken and whether the samples in the corpus are balanced and representative. The major difference between creating and analyzing a corpus, however, is that while the creator of a corpus has the option of adjusting what is included in the corpus to compensate for any complications that arise during the creation of the corpus, the corpus analyst is confronted with a fixed corpus, and has to decide whether to continue with an analysis if the corpus is not entirely suitable for analysis, or find a new corpus altogether.
This chapter describes the process of analyzing a completed corpus. It begins with a discussion of how to frame a research question so that from the start, the analyst has a clear “hypothesis” to test out in a corpus and avoids the common complaint that many voice about corpus-based analyses: that many such analyses do little more than simply “count” linguistic features in a corpus, paying little attention to the significance of the counts. The next sections describe the process of doing a corpus analysis: how to determine whether a given corpus is appropriate for a particular linguistic analysis, how to extract grammatical information relevant to the analysis, how to create data files for recording the grammatical information taken from the corpus, and how to determine the appropriate statistical tests for analyzing the information in the data files that have been created.
- Type
- Chapter
- Information
- English Corpus LinguisticsAn Introduction, pp. 100 - 137Publisher: Cambridge University PressPrint publication year: 2002