Natural Language Engineering: Volume 5 - Issue 4

Context-sensitive spoken dialogue processing with the DOP model
RENS BOD
Published online by Cambridge University Press:

01 December 1999, pp. 309-323
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We show how the DOP model can be used for fast and robust context-sensitive processing of spoken input in a practical spoken dialogue system called OVIS. OVIS (Openbaar Vervoer Informatie Systeem) – ‘Public Transport Information System’, is a Dutch spoken language information system which operates over ordinary telephone lines. The prototype system is the immediate goal of the NWO Priority Programme ‘Language and Speech Technology’. In this paper, we extend the original Data-Oriented Parsing (DOP) model to context-sensitive interpretation of spoken input. The system we describe uses the OVIS corpus (which consists of 10,000 trees enriched with compositional semantics) to compute from an input word-graph the best utterance together with its meaning. Dialogue context is taken into account by dividing up the OVIS corpus into context-dependent subcorpora. Each system question triggers a subcorpus by which the user answer is analysed and interpreted. Our experiments indicate that the context-sensitive DOP model obtains better accuracy than the original model, allowing for fast and robust processing of spoken input.

MLDS: A translator-oriented MultiLingual dictionary system
E. AGIRRE, X. ARREGI, X. ARTOLA, A. DIAZ DE ILARRAZA, K. SARASOLA, A. SOROA
Published online by Cambridge University Press:

01 December 1999, pp. 325-353
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This paper focuses on the design methodology of the MultiLingual Dictionary-System (MLDS), which is a human-oriented tool for assisting in the task of translating lexical units, oriented to translators and conceived from studies carried out with translators. We describe the model adopted for the representation of multilingual dictionary-knowledge. Such a model allows an enriched exploitation of the lexical-semantic relations extracted from dictionaries. In addition, MLDS is supplied with knowledge about the use of the dictionaries in the process of lexical translation, which was elicitated by means of empirical methods and specified in a formal language. The dictionary-knowledge along with the task-oriented knowledge are used to offer the translator active, anticipative and intelligent assistance.

Topic-based mixture language modelling
YOSHIHIKO GOTOH, STEVE RENALS
Published online by Cambridge University Press:

01 December 1999, pp. 355-375
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This paper describes an approach for constructing a mixture of language models based on simple statistical notions of semantics using probabilistic models developed for information retrieval. The approach encapsulates corpus-derived semantic information and is able to model varying styles of text. Using such information, the corpus texts are clustered in an unsupervised manner and a mixture of topic-specific language models is automatically created. The principal contribution of this work is to characterise the document space resulting from information retrieval techniques and to demonstrate the approach for mixture language modelling. A comparison is made between manual and automatic clustering in order to elucidate how the global content information is expressed in the space. We also compare (in terms of association with manual clustering and language modelling accuracy) alternative term-weighting schemes and the effect of singular value decomposition dimension reduction (latent semantic analysis). Test set perplexity results using the British National Corpus indicate that the approach can improve the potential of statistical language modelling. Using an adaptive procedure, the conventional model may be tuned to track text data with a slight increase in computational cost.

Evaluating two methods for Treebank grammar compaction
ALEXANDER KROTOV, MARK HEPPLE, ROBERT GAIZAUSKAS, YORICK WILKS
Published online by Cambridge University Press:

01 December 1999, pp. 377-394
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Treebanks, such as the Penn Treebank, provide a basis for the automatic creation of broad coverage grammars. In the simplest case, rules can simply be ‘read off’ the parse-annotations of the corpus, producing either a simple or probabilistic context-free grammar. Such grammars, however, can be very large, presenting problems for the subsequent computational costs of parsing under the grammar. In this paper, we explore ways by which a treebank grammar can be reduced in size or ‘compacted’, which involve the use of two kinds of technique: (i) thresholding of rules by their number of occurrences; and (ii) a method of rule-parsing, which has both probabilistic and non-probabilistic variants. Our results show that by a combined use of these two techniques, a probabilistic context-free grammar can be reduced in size by 62% without any loss in parsing performance, and by 71% to give a gain in recall, but some loss in precision.

Jennifer Pearson. Terms in Context. John Benjamins Publishing Co., Amsterdam. 1998. ISBN 90 272 2269 X (Eur.)/1 55619 342 (US). $69.00. xiii+243 pages
PAUL R. BOWDEN
Published online by Cambridge University Press:

01 December 1999, pp. 395-402
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Nomi Erteschik-Shir, The Dynamics of Focus Structure. Cambridge: Cambridge University Press, 1997. ISBN 0 521 59217 8, Price £40.00/$64.96 (hardback). 280+xiv pages.
MAYUMI MASUKO
Published online by Cambridge University Press:

01 December 1999, pp. 395-402
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Natural Language Processing

Refine listing

Actions for selected content:

Natural Language Engineering, Volume 5 - Issue 4 - December 1999

Research Article

Context-sensitive spoken dialogue processing with the DOP model

MLDS: A translator-oriented MultiLingual dictionary system

Topic-based mixture language modelling

Evaluating two methods for Treebank grammar compaction

Book Reviews

Jennifer Pearson. Terms in Context. John Benjamins Publishing Co., Amsterdam. 1998. ISBN 90 272 2269 X (Eur.)/1 55619 342 (US). $69.00. xiii+243 pages

Nomi Erteschik-Shir, The Dynamics of Focus Structure. Cambridge: Cambridge University Press, 1997. ISBN 0 521 59217 8, Price £40.00/$64.96 (hardback). 280+xiv pages.

Natural Language Processing

Refine listing

Actions for selected content:

Save Search

Natural Language Engineering, Volume 5 - Issue 4 - December 1999

Research Article

Context-sensitive spoken dialogue processing with the DOP model

MLDS: A translator-oriented MultiLingual dictionary system

Topic-based mixture language modelling

Evaluating two methods for Treebank grammar compaction

Book Reviews

Jennifer Pearson. Terms in Context. John Benjamins Publishing Co., Amsterdam. 1998. ISBN 90 272 2269 X (Eur.)/1 55619 342 (US). $69.00. xiii+243 pages

Nomi Erteschik-Shir, The Dynamics of Focus Structure. Cambridge: Cambridge University Press, 1997. ISBN 0 521 59217 8, Price £40.00/$64.96 (hardback). 280+xiv pages.