Article contents
Efficiently generating correction suggestions for garbled tokens of historical language
Published online by Cambridge University Press: 21 March 2011
Abstract
Text correction systems rely on a core mechanism where suitable correction suggestions for garbled input tokens are generated. Current systems, which are designed for documents including modern language, use some form of approximate search in a given background lexicon. Due to the large amount of spelling variation found in historical documents, special lexica for historical language can only offer restricted coverage. Hence historical language is often described in terms of a matching procedure to be applied to modern words. Given such a procedure and a base lexicon of modern words, the question arises of how to generate correction suggestions for garbled historical variants. In this paper we suggest an efficient algorithm that solves this problem. The algorithm is used for postcorrection of optical character recognition results on historical document collections.
- Type
- Papers
- Information
- Natural Language Engineering , Volume 17 , Special Issue 2: Finite-State Methods and Models in Natural Language Processing , April 2011 , pp. 265 - 282
- Copyright
- Copyright © Cambridge University Press 2011
References
- 12
- Cited by