Book contents
- Frontmatter
- Contents
- Preface
- Acknowledgements
- 1 Introduction
- 2 The data and some applications
- Part I The Tools
- 3 Basic stringology
- 4 Representing languages
- 5 Representing distributions over strings with automata and grammars
- 6 About combinatorics
- Part II What Does Learning a Language Mean?
- Part III Learning Algorithms and Techniques
- References
- Index
4 - Representing languages
from Part I - The Tools
Published online by Cambridge University Press: 05 July 2014
- Frontmatter
- Contents
- Preface
- Acknowledgements
- 1 Introduction
- 2 The data and some applications
- Part I The Tools
- 3 Basic stringology
- 4 Representing languages
- 5 Representing distributions over strings with automata and grammars
- 6 About combinatorics
- Part II What Does Learning a Language Mean?
- Part III Learning Algorithms and Techniques
- References
- Index
Summary
Structures are the weapons of the mathematician.
BourbakiIt is no coincidence that in no known language does the phrase ‘As pretty as an airport’ appear.
Douglas AdamsLearning languages requires, for the process to be of any practical value, agreement on a representation of these languages. We turn to formal language theory to provide us with such meaningful representations, and adapt these classical definitions to the particular task of grammatical inference only when needed.
Automata and finite state machines
Automata are finite state machines used to recognise strings. They correspond to a simplified and limited version of Turing machines: a string is written on the input tape, the string is then read from left to right and, at each step, the next state of the system is chosen depending only on the previous state and the letter or symbol being read. The fact that this is the only information that can be used to parse the string makes the system powerful enough to accept just a limited class of languages called regular languages. The recognition procedure can be made deterministic by allowing only one action to be possible at each step (therefore for each state and each symbol). It is usually nicer and easier to manipulate these deterministic machines (called deterministic finite automata) because parsing is then performed in a much more convenient and economic way, and also because a number of theoretical results only apply to these.
- Type
- Chapter
- Information
- Grammatical InferenceLearning Automata and Grammars, pp. 70 - 85Publisher: Cambridge University PressPrint publication year: 2010