Hostname: page-component-586b7cd67f-g8jcs Total loading time: 0 Render date: 2024-11-24T01:42:49.385Z Has data issue: false hasContentIssue false

Full parsing approximation for information extraction via finite-state cascades

Published online by Cambridge University Press:  21 August 2002

FABIO CIRAVEGNA
Affiliation:
Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, UK
ALBERTO LAVELLI
Affiliation:
ITC-irst Centro per la Ricerca Scientifica e Tecnologica, via Sommarive 18, 38050 Povo (TN), Italy

Abstract

This paper proposes a robust approach to parsing suitable for Information Extraction (IE) from texts using finite-state cascades. The approach is characterized by the construction of an approximation of the full parse tree that captures all the information relevant for IE purposes, leaving the other relations underspecified. Sequences of cascades of finite-state rules deterministically analyze the text, building unambiguous structures. Initially basic chunks are analyzed; then clauses are recognized and nested; finally modifier attachment is performed and the global parse tree is built. The parsing approach allows robust, effective and efficient analysis of real world texts. The grammar organization simplifies changes, insertion of new rules and integration of domain-oriented rules. The approach has been tested for Italian, English, and Russian. A parser based on such an approach has been implemented as part of Pinocchio, an environment for developing and running IE applications.

Type
Research Article
Copyright
2002 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)