Hostname: page-component-745bb68f8f-v2bm5 Total loading time: 0 Render date: 2025-01-11T23:33:24.343Z Has data issue: false hasContentIssue false

Poisson mixtures

Published online by Cambridge University Press:  12 September 2008

Kenneth W. Church
Affiliation:
AT&T Bell Laboratories, Murray Hill, NJ 07974, USA. e-mail: [email protected]
William A. Gale
Affiliation:
AT&T Bell Laboratories, Murray Hill, NJ 07974, USA. e-mail: [email protected]

Abstract

Shannon (1948) showed that a wide range of practical problems can be reduced to the problem of estimating probability distributions of words and ngrams in text. It has become standard practice in text compression, speech recognition, information retrieval and many other applications of Shannon's theory to introduce a “bag-of-words” assumption. But obviously, word rates vary from genre to genre, author to author, topic to topic, document to document, section to section, and paragraph to paragraph. The proposed Poisson mixture captures much of this heterogeneous structure by allowing the Poisson parameter θ to vary over documents subject to a density function φ. φ is intended to capture dependencies on hidden variables such genre, author, topic, etc. (The Negative Binomial is a well-known special case where φ is a Г distribution.) Poisson mixtures fit the data better than standard Poissons, producing more accurate estimates of the variance over documents (σ2), entropy (H), inverse document frequency (IDF), and adaptation (Pr(x ≥ 2/x ≥ 1)).

Type
Articles
Copyright
Copyright © Cambridge University Press 1995

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bell, T., Cleary, J., and Witten, I., (1990) Text Compression. Prentice Hall.Google Scholar
Bookstein, A., (1982) Explanation and generalization of vector models in information. In Conference on Research and Development in Information Retrieval (SIGIR). Pp. 118132.Google Scholar
Bookstein, A., and Swanson, D., (1974). Probabilistic models for automatic indexing. Journal of the American Society for Information Science. 25(5): 312318.CrossRefGoogle Scholar
Clover, T., and Thomas, J., (1991) Elements of Information Theory. John Wiley.Google Scholar
Francis, W., and Kucera, H., (1982) Frequency Analysis of English Usage. Houghton Mifflin Co.Google Scholar
Gale, W., Church, K., and Yarowsky, D., (1993) A method for disambiguating word senses in a large corpus. Computers and Humanities. 415439.Google Scholar
Harter, S., (1975) A probabilistic approach to automatic keyword indexing: Part I. On the distribution of speciality words in a technical literature. Journal of the American Society for Information Science. 26(4): 197206.CrossRefGoogle Scholar
Johnson, N., and Kotz, S., (1969) Discrete Distributions. Houghton Mifflin Co.Google Scholar
Lau, R., Rosenfeld, R., and Roukos, S., (1993) Adaptive Language Modeling using the Maximum Entropy Principle. ARPA sponsored workshop on Human Language Technology. Morgan Kaufmann. Pp. 108113.Google Scholar
Mosteller, F., and Wallace, D., (1964) Inference and Disputed Authorship: The Federalist. Addison-Wesley.Google Scholar
Robertson, S., and Walker, S., (1994) Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In SIGIR. Pp. 232241.CrossRefGoogle Scholar
Salton, G., (1989) Automatic Text Processing. Addison-Wesley.Google Scholar
Shannon, C., (1948) The mathematical theory of communication. Bell System Technical Journal.CrossRefGoogle Scholar
Sparck, Jones K., (1972) A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation. 28(1): 1121.CrossRefGoogle Scholar
van Rijsbergen, C., (1979) Information Retrieval. Second Edition. Butterworth.Google Scholar
Yarowsky, D., (1992) Word-Sense Disambiguation Using Statistical Models of Roget's Categories Trained on Large Corpora. Coling.Google Scholar