Book contents
- Frontmatter
- Contents
- List of contributors
- Preface
- Frequently used notations and symbols
- 1 Algebraic and geometric methods in statistics
- Part I Contingency tables
- 2 Maximum likelihood estimation in latent class models for contingency table data
- 3 Algebraic geometry of 2×2 contingency tables
- 4 Model selection for contingency tables with algebraic statistics
- 5 Markov chains, quotient ideals and connectivity with positive margins
- 6 Algebraic modelling of category distinguishability
- 7 The algebraic complexity of maximum likelihood estimation for bivariate missing data
- 8 The generalised shuttle algorithm
- Part II Designed experiments
- Part III Information geometry
- Part IV Information geometry and algebraic statistics
- Part V On-line supplements
4 - Model selection for contingency tables with algebraic statistics
from Part I - Contingency tables
Published online by Cambridge University Press: 27 May 2010
- Frontmatter
- Contents
- List of contributors
- Preface
- Frequently used notations and symbols
- 1 Algebraic and geometric methods in statistics
- Part I Contingency tables
- 2 Maximum likelihood estimation in latent class models for contingency table data
- 3 Algebraic geometry of 2×2 contingency tables
- 4 Model selection for contingency tables with algebraic statistics
- 5 Markov chains, quotient ideals and connectivity with positive margins
- 6 Algebraic modelling of category distinguishability
- 7 The algebraic complexity of maximum likelihood estimation for bivariate missing data
- 8 The generalised shuttle algorithm
- Part II Designed experiments
- Part III Information geometry
- Part IV Information geometry and algebraic statistics
- Part V On-line supplements
Summary
Abstract
Goodness-of-fit tests based on chi-square approximations are commonly used in the analysis of contingency tables. Results from algebraic statistics combined with MCMC methods provide alternatives to the chi-square approximation. However, within a model selection procedure usually a large number of models is considered and extensive simulations would be necessary. We show how the simulation effort can be reduced by an appropriate analysis of the involved Gröbner bases.
Introduction
Categorical data occur in many different areas of statistical applications. The analysis usually concentrates on the detection of the dependence structure between the involved random variables. Log-linear models are adopted to describe such association patterns, see (Bishop et al. 1995, Agresti 2002) and model selection methods are used to find the model from this class, which fits the data best in a given sense. Often, goodness-of-fit tests for log-linear models are applied, which involve chi-square approximations for the distribution of the test statistic. If the table is sparse such an approximation might fail. By combining methods from computational commutative algebra and from statistics, (Diaconis and Sturmfels 1998) provide the background for alternative tests. They use the MCMC approach to get a sample from a conditional distribution of a discrete exponential family with given sufficient statistic. In particular Gröbner bases are used for the construction of the Markov chain. This approach has been applied to a number of tests for the analysis of contingency tables (Rapallo 2003, Rapallo 2005, Krampe and Kuhnt 2007). Such tests have turned out to be a valuable addition to traditional exact and asymptotic tests.
- Type
- Chapter
- Information
- Algebraic and Geometric Methods in Statistics , pp. 83 - 98Publisher: Cambridge University PressPrint publication year: 2009
- 3
- Cited by