Published online by Cambridge University Press: 13 May 2004
Plain lists of collocations as provided to date by most approaches to automatic acquisition of collocations from corpora are useful as a resource for dictionary construction. However, their use is rather limited in the case of NLP-applications such as Text Generation, Machine Translation and Text Summarization if not enriched by information on the grammatical function of the collocation elements and by information on the semantics of the collocations as multiword units. In this article, we describe an approach to a fine-grained classification of verb-noun bigrams according to a semantically motivated typology of collocations and illustrate this with Spanish material. The typology of collocations that underlies our classification is based on verb-noun Lexical Functions (LFs) from the Explanatory Combinatorial Lexicology. In the first stage of the approach, the program learns the semantic features of each LF from training data. In the second stage, it examines the semantic features of verb-noun candidate bigrams and compares them with the features of all the LFs taken into account. A candidate whose features are sufficiently similar to those of a specific LF is considered to be an instance of this LF. The semantic features of both the training material and the candidate bigrams are derived from the hyperonymy hierarchies provided by the EuroWordNet. In the experiments carried out to validate the approach, we achieved an average $f$-score of about 70%.