Book contents
- Frontmatter
- Contents
- Preface
- Chapter 1 Algorithms on Words
- Chapter 2 Structures for Indexes
- Chapter 3 Symbolic Natural Language Processing
- Chapter 4 Statistical Natural Language Processing
- Chapter 5 Inference of Network Expressions
- Chapter 6 Statistics on Words with Applications to Biological Sequences
- Chapter 7 Analytic Approach to Pattern Matching
- Chapter 8 Periodic Structures in Words
- Chapter 9 Counting, Coding, and Sampling with Words
- Chapter 10 Words in Number Theory
- References
- General Index
Chapter 4 - Statistical Natural Language Processing
Published online by Cambridge University Press: 05 June 2013
- Frontmatter
- Contents
- Preface
- Chapter 1 Algorithms on Words
- Chapter 2 Structures for Indexes
- Chapter 3 Symbolic Natural Language Processing
- Chapter 4 Statistical Natural Language Processing
- Chapter 5 Inference of Network Expressions
- Chapter 6 Statistics on Words with Applications to Biological Sequences
- Chapter 7 Analytic Approach to Pattern Matching
- Chapter 8 Periodic Structures in Words
- Chapter 9 Counting, Coding, and Sampling with Words
- Chapter 10 Words in Number Theory
- References
- General Index
Summary
Introduction
The application of statistical methods to natural language processing has been remarkably successful over the past two decades. The wide availability of text and speech corpora has played a critical role in their success since, as for all learning techniques, these methods rely heavily on data. Many of the components of complex natural language processing systems, for example, text normalizers, morphological or phonological analyzers, part-of-speech taggers, grammars or language models, pronunciation models, context-dependency models, acoustic Hidden-Markov Models (HMMs), are statistical models derived from large data sets using modern learning techniques. These models are often given as weighted automata or weighted finite-state transducers either directly or as a result of the approximation of more complex models.
Weighted automata and transducers are the finite automata and finite-state transducers described in Chapter 1 Section 1.5 with the addition of some weight to each transition. Thus, weighted finite-state transducers are automata in which each transition, in addition to its usual input label, is augmented with an output label from a possibly different alphabet, and carries some weight. The weights may correspond to probabilities or log-likelihoods or they may be some other costs used to rank alternatives. More generally, as we shall see in the next section, they are elements of a semiring set. Transducers can be used to define a mapping between two different types of information sources, for example, word and phoneme sequences.
- Type
- Chapter
- Information
- Applied Combinatorics on Words , pp. 210 - 240Publisher: Cambridge University PressPrint publication year: 2005
- 1
- Cited by