Maximum Augmented Empirical Likelihood Estimation of Categorical Marginal Models for Large Sparse Contingency Tables

L. Andries van der Ark; Wicher P. Bergsma; Letty Koopman

doi:10.1007/s11336-023-09932-7

Maximum Augmented Empirical Likelihood Estimation of Categorical Marginal Models for Large Sparse Contingency Tables

Published online by Cambridge University Press: 01 January 2025

L. Andries van der Ark

Wicher P. Bergsma

and

Letty Koopman

Show author details

L. Andries van der Ark*: Affiliation:
University of Amsterdam
Wicher P. Bergsma: Affiliation:
The London School of Economics and Political Science
Letty Koopman: Affiliation:
University of Amsterdam
*: Correspondence should bemade to L. Andries van der Ark, Research Institute of Child Development and Education, University of Amsterdam, P.O. Box 15776, 1001, NG Amsterdam, The Netherlands. Email: [email protected]

Article contents

Abstract
CMMs
Estimation of CMMs
Comparing ML, MEL, and MAEL
Discussion
Footnotes
References

Rights & Permissions

Abstract

Categorical marginal models (CMMs) are flexible tools for modelling dependent or clustered categorical data, when the dependencies themselves are not of interest. A major limitation of maximum likelihood (ML) estimation of CMMs is that the size of the contingency table increases exponentially with the number of variables, so even for a moderate number of variables, say between 10 and 20, ML estimation can become computationally infeasible. An alternative method, which retains the optimal asymptotic efficiency of ML, is maximum empirical likelihood (MEL) estimation. However, we show that MEL tends to break down for large, sparse contingency tables. As a solution, we propose a new method, which we call maximum augmented empirical likelihood (MAEL) estimation and which involves augmentation of the empirical likelihood support with a number of well-chosen cells. Simulation results show good finite sample performance for very large contingency tables.

Keywords

categorical marginal model Cronbach’s alpha large categorical data sets marginal homogeneity maximum empirical likelihood estimation maximum likelihood estimation scalability coefficients

Type: Theory and Methods
Information: Psychometrika , Volume 88 , Issue 4 , December 2023 , pp. 1228 - 1248

DOI: https://doi.org/10.1007/s11336-023-09932-7 [Opens in a new window]
Creative Commons: This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Copyright: Copyright © 2023 The Author(s)

Categorical marginal models (CMMs; Bergsma et al., Reference Bergsma, Croon and Hagenaars2009; also see, e.g., Bergsma, Reference Bergsma1997; Bergsma & Rudas, Reference Bergsma and Rudas2002; Bartolucci et al., Reference Bartolucci, Colombi and Forcina2007; Colombi & Forcina, Reference Colombi and Forcina2001; Evans & Forcina, Reference Evans and Forcina2013; Lang & Agresti, Reference Lang and Agresti1994; Lang, Reference Lang1996; Molenberghs & Lesaffre, Reference Molenberghs and Lesaffre1999; Rudas & Bergsma, Reference Rudas, Bergsma, Kateri and Moustaki2023) are flexible tools to model location, spread, and association in dependent or clustered categorical data, when the dependence itself is not of interest. CMMs require data in a table format for input; that is, for a dataset with N respondents and J categorical variables, CMMs require a (vectorized) J-variate contingency table, where each cell corresponds to a response pattern, and the frequencies within the cells represent the observed frequencies of each response pattern. The only assumption of the CMMs under consideration is that the cell frequencies in the contingency table follow a multinomial distribution, rendering a very flexible method.

CMMs can be a valuable psychometric tool since they allow for null-hypothesis significance testing (NHST) of complex coefficients without the need to specify a parametric model or impose additional assumptions. In Psychometrics, NHTS often occurs under the assumption of a parametric model. For example, testing measurement invariance across several groups is typically done under a structural equation model (e.g., Cheung & Rensvold, Reference Cheung and Rensvold2002). However, rather than testing $H_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_0$$\end{document} (the null-hypothesis of interest), we implicitly test $H_{0}^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_0^*$$\end{document} : $H_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_0$$\end{document} plus the assumption that the structural equation model fits the data. Rejecting $H_{0}^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_0^*$$\end{document} does not provide information about $H_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_0$$\end{document} because $H_{0}^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_0^*$$\end{document} should be rejected either when $H_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_0$$\end{document} is false or when the structural equation model does not fit the data (cf. Jorgensen et al., Reference Jorgensen, Kite, Chen, van der Ark, Wiberg, Culpepper, Douglas and Wang2017). In other fields of psychometrics (e.g., nonparametric modeling, classical test theory) and applied statistics, there is no comprehensive parametric modeling framework. In such situations, it becomes particularly valuable if the assumptions required for NHST are easily satisfied, ensuring that the null hypothesis of interest is not excessively confounded by data failing to meet the assumptions, thus maintaining a close approximation between $H_{0}^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_0^*$$\end{document} and $H_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_0$$\end{document} . The CMM assumption that cell frequencies follow a multinomial distribution is very lenient, implying that every response pattern should, in principle, be observable.

The process of relaxing assumptions for NHST can be a time-consuming endeavor spanning several years. For instance, in the case of NHST for Cronbach’s alpha, there exists a history of research papers progressively relaxing the required assumptions: Feldt derived tests for three types of null-hypothesis on Cronbach’s alpha: alpha equals some criterion value (Feldt, Reference Feldt1969), alpha is equal across groups (Feldt, Reference Feldt1965), and alpha is equal across different measurements (Feldt Reference Feldt1980). Feldt assumed that alpha asymptotically follows an F distribution. This assumptions was subsequently relaxed by Van Zyl et al. (Reference Van Zyl, Neudecker and Nel2000), who derived a distribution without restricting the covariances, Maydeu-Olivares et al. (Reference Maydeu-Olivares, Coffman and Hartmann2007) who relaxed the assumptions of Feldt’s first hypothesis by deriving asymptotically distribution-free interval estimates for alpha, Maydeu-Olivares et al. (Reference Maydeu-Olivares, Coffman, García-Forero and Gallardo-Pujol2010) who proposed testing Feldt’s hypotheses in a structural equation modeling framework, and ultimately, Kuijpers et al. (Reference Kuijpers, Van der Ark and Croon2013), who proposed using CMMs for testing Feldt’s hypotheses. Each successive paper demonstrated significant enhancements in the properties of NHST for Cronbach’s alpha when compared to its predecessors.

In some cases, no hypothesis tests are available leaving CMMs as a possible option to derive hypothesis tests. For example, Van der Ark et al. (Reference Van der Ark, Croon and Sijtsma2008) used CMMs for developing NHST for Mokken’s (Reference Mokken1971) scalability coefficients, which allows testing scalability coefficients for item pairs, individual items, and scales across groups and across measurement occasions. Finally, we would like to note that CMMs can be used in conjunction with latent variables models, although this needs further development. We refer to Bergsma et al. (Reference Bergsma, Croon and Hagenaars2009), for other applications of CMMs, and Bergsma et al. (Reference Bergsma, Croon and Hagenaars2009, Reference Bergsma, Croon and Hagenaars2013) who introduced CMMs with latent variables.

CMMs can be estimated using the maximum likelihood (ML) method, which has many favorable properties, including asymptotic efficiency. A serious limitation of the ML method is that for large contingency tables estimation is infeasible, as ML requires the computation of an expected frequency for each cell in the contingency table. This curse of dimensionality may be an important reason why CMMs have failed to become popular in psychometrics. Most psychological and educational tests consist of many variables (usually referred to as items) yielding an extremely large number of possible response patterns and, therefore, extremely large contingency tables. For example, Raven’s Advanced Progressive Matrices (Raven et al., Reference Raven, Raven and Court2003), measuring general intelligence, consists of 48 binary items, which yields a contingency table of $2^{48} \approx 2.81 \times 10^{14}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^{48} \approx 2.81 \times 10^{14}$$\end{document} cells; and the personality inventory NEO-PI-R (Costa & McCrae Reference Costa, McCrae, Boyle, Matthews and Saklofske2008), measuring five personality traits, consists of 48 five-category items per trait, which yields a contingency table of $5^{5 \times 48} \approx 5.66 \times 10^{167}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$5^{5 \times 48} \approx 5.66 \times 10^{167}$$\end{document} cells. Lloyd (Reference Lloyd2000) estimated that if every particle in the universe could be used as part of a huge computer, it could store approximately $10^{90}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^{90}$$\end{document} bits. Hence, for contingency tables based on psychological and educational tests, the required computer capacity easily exceeds the ultimate physical limits of computation, whereas The ML estimation procedure to estimate CMMs implemented in the R-package cmm (Bergsma & Van der Ark, Reference Bergsma and Van der Ark2023) cannot handle more than a few million cells.

In this paper, we give a new adaptation to the ML estimation procedure to solve the above problem. Although there are alternative estimation procedures that may be used to estimate CMMs, we preferred to stay within a ML-framework as ML guarantees asymptotic efficiency, whereas alternatives estimation methods for contingency tables, such as generalizing estimation equations (GEE’s, e.g., Qaqish & Liang, Reference Qaqish and Liang1992), and composite likelihood (e.g., Varin et al., Reference Varin, Reid and Firth2011) are not, and weighted least squares (Grizzle et al., Reference Grizzle, Starmer and Koch1969; a.k.a the GSK-method) is sensitive to sparsity in the marginal distribution (cf. Rudas & Bergsma, Reference Rudas, Bergsma, Kateri and Moustaki2023). In addition, an adaptation of the ML approach is easy to fit in the existing software.

Initially, we considered the empirical likelihood method (Owen, Reference Owen2001, Qin & Lawless, Reference Qin and Lawless1994), a data-driven, nonparametric estimation method. The core idea behind the empirical likelihood method is to construct a likelihood function directly from the observed data, without assuming any specific underlying probability distribution; that is, given vector valued data $x_{1}, \dots, x_{N}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{x}_1,\ldots ,\textbf{x}_N$$\end{document} , an empirical likelihood is the likelihood of a probability distribution with support ${x_{1}, \dots, x_{N}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\textbf{x}_1,\ldots ,\textbf{x}_N\}$$\end{document} (Owen, Reference Owen2001). In the context of CMMs, the empirical likelihood method involves constructing the likelihood solely from cells with nonzero frequencies, while regarding cells with zero frequency as structural zeroes and setting their estimated probability to zero. Given that the number of cells with nonzero frequencies cannot exceed the sample size, and in the case of psychological and educational test data, the sample size rarely exceeds 10,000, the empirical likelihood method serves as a computationally feasible alternative to ML. We abbreviate the method of maximizing the empirical likelihood subject to model constraints by MEL.

Unfortunately, the support ${x_{1}, \dots, x_{N}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\textbf{x}_1,\ldots ,\textbf{x}_N\}$$\end{document} belonging to the empirical likelihood may be too small (i) to estimate the parameters of a CMM, or, even if this can be done, (ii) to estimate the asymptotic covariance matrix of the ML estimators of the parameters of the CMM. We will refer to these two problems as the first- and second-order estimation problems, respectively (see Appendix A for more details). The first problem has also been called the empty set problem (Grendár & Judge, Reference Grendár and Judge2009). As far as we are aware, the second problem has not yet been described in the literature. The solution to these problems which we propose in this paper is to augment the empirical likelihood support with a number of well-chosen points, and we will refer to the method of maximizing the resulting empirical likelihood as maximum augmented empirical likelihood (MAEL). Note that as the sample size goes to infinity, assuming no structural zeroes, the probability that all cells in a contingency table will have a positive count will go to 1, so for categorical data MEL, MAEL and ML are asymptotically equivalent.

The reason why MEL and MAEL estimators work asymptotically (as $N \to \infty$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N \rightarrow \infty $$\end{document} ) is because they are with probability tending to 1 equivalent to ML estimator. That justifies testing goodness of fit and making inferences for parameters in same ways as we would do with ML. Two related methods, called adjusted empirical likelihood, Chen et al. (Reference Chen, Variyath and Abraham2008) and balanced augmented empirical likelihood (Emerson & Owen, 2009; also see Nguyen et al., Reference Nguyen, Phelps and Ng2015, Xia & Liu, Reference Xia and Liu2019) have been considered for continuous data. These methods augment the data set with one or two additional observations. In contrast, our methodology consists of only augmenting the support of distributions corresponding to the empirical likelihood with additional points, but without adding any observations to the data.

The remainder of the paper is organized as follows. In Sect. 1, we give a brief overview of and notation for CMMs. In Sect. 2, we describe ML and MEL estimation for CMMs and introduce MAEL estimation. In Sect. 3, we present two simulation studies. Study 1 compares the convergence rate and computation time of ML, MEL, and MAEL estimation for small contingency tables, and Study 2 investigates the Type I error rate of CMMs using MAEL estimation for small and large contingency tables, and bias and variance of the model parameters. In Sect. 4, we briefly discuss the advantages and disadvantages of MAEL estimation in relation to other, non-likelihood-based estimation procedures. In Appendix A, we describe the first- and second-order estimation problems in some generality, whereas Appendix B gives details of the estimation algorithm used.

1. CMMs

Consider the categorical variables $X_{1}, \dots, X_{j}, \dots, X_{J}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1, \dots , X_j, \dots , X_J$$\end{document} with $X_{j} \in {0, \dots g_{j}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_j \in \{0, \dots g_j\}$$\end{document} . Let $x_{1}, \dots, x_{i},$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{x}_1,\ldots ,\textbf{x}_i,$$\end{document} $\dots, x_{N}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ldots ,\textbf{x}_N$$\end{document} be i.i.d. data points, where each $x_{i} = (x_{i 1}, \dots, x_{iJ})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{x}_i=(x_{i1},\ldots ,x_{iJ})$$\end{document} consists of the scores of the ith respondent on the variables $X_{1}, \dots, X_{J}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1, \dots , X_J$$\end{document} . The data can be collected in a J-way contingency table of observed frequencies with $L = \prod_{j = 1}^{J} g_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L = \prod _{j=1}^J g_j$$\end{document} cells. The observed frequency of the response pattern $(x_{1}, \dots, x_{J})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(x_1, \ldots , x_J)$$\end{document} on variables $(X_{1}, \dots, X_{J})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(X_1, \ldots , X_J)$$\end{document} is denoted by $n_{x_{1},}^{X_{1},}_{\dots,}^{\dots,}_{x_{J}}^{X_{J}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\hspace{-.5pt}{{^{X_1,}_{x_1,}}}\hspace{-.5pt}{{^{\dots ,}_{\dots ,}}}\hspace{-.5pt}{{^{X_J}_{x_J}}}$$\end{document} . The observed frequencies in the contingency table are collected in an $L \times 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L\times 1$$\end{document} vector $n$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{n}$$\end{document} , arranged in lexicographical order; that is, the digit in the last row of the corresponding response pattern changes fastest and the digit in the first row changes slowest. As an example, Eq. 1 shows the vector $n$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{n}$$\end{document} containing the observed frequencies of the response patterns pertaining to the scores of $N = 130$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N = 130$$\end{document} respondents on $J = 3$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J = 3$$\end{document} binary variables, a, b, and c:

(1)

\begin{matrix} n = (\begin{matrix} n_{000}^{abc} \\ n_{001}^{abc} \\ n_{010}^{abc} \\ n_{011}^{abc} \\ n_{100}^{abc} \\ n_{101}^{abc} \\ n_{110}^{abc} \\ n_{111}^{abc} \end{matrix}) = (\begin{matrix} 20 \\ 15 \\ 10 \\ 15 \\ 0 \\ 15 \\ 25 \\ 30 \end{matrix}) . \end{matrix}

If it is clear which variables are involved, then the superscript may be omitted. Marginal frequencies are denoted by removing the appropriate variable(s) from the subscript and score(s) from the superscript. In some formulas, the subscript i in $n_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n_i$$\end{document} is used as an index. For example, $\sum_{i} n_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum _i n_i$$\end{document} means the sum over all elements of $n$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{n}$$\end{document} .

The probability that a randomly drawn respondent has response pattern $x_{1}, \dots, x_{J}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_1, \dots , x_J$$\end{document} given that the CMM of interest is true, is denoted by $π_{x_{1},}^{X_{1},}_{\dots,}^{\dots,}_{x_{J}}^{X_{J}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi \hspace{-.5pt}{{^{X_1,}_{x_1,}}}\hspace{-.5pt}{{^{\dots ,}_{\dots ,}}}\hspace{-.5pt}{{^{X_J}_{x_J}}}$$\end{document} . Assuming a fixed sample size N, let $m_{x_{1},}^{X_{1},}_{\dots,}^{\dots,}_{x_{J}}^{X_{J}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m\hspace{-.5pt}{{^{X_1,}_{x_1,}}}\hspace{-.5pt}{{^{\dots ,}_{\dots ,}}}\hspace{-.5pt}{{^{X_J}_{x_J}}}$$\end{document} be the expected frequency satisfying $m_{x_{1},}^{X_{1},}_{\dots,}^{\dots,}_{x_{J}}^{X_{J}} = N \times π_{x_{1},}^{X_{1},}_{\dots,}^{\dots,}_{x_{J}}^{X_{J}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m\hspace{-.5pt}{{^{X_1,}_{x_1,}}}\hspace{-.5pt}{{^{\dots ,}_{\dots ,}}}\hspace{-.5pt}{{^{X_J}_{x_J}}} = N \times \pi \hspace{-.5pt}{{^{X_1,}_{x_1,}}}\hspace{-.5pt}{{^{\dots ,}_{\dots ,}}}\hspace{-.5pt}{{^{X_J}_{x_J}}}$$\end{document} . The expected frequencies and probabilities are collected in vectors $m$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{m}$$\end{document} , and $π$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{\uppi }}}$$\end{document} , respectively, in the same manner as the observed frequencies were collected in $n$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{n}$$\end{document} . ML estimates of $m$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{m}$$\end{document} and $π$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{\uppi }}}$$\end{document} are denoted by $\hat{m}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\textbf{m}}$$\end{document} and $\hat{π}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{{\varvec{\uppi }}}}$$\end{document} , respectively. Without any constraints imposed upon the data, $\hat{m} = n$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\textbf{m}} = \textbf{n}$$\end{document} and $\hat{π} = n / N$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{{\varvec{\uppi }}}} = \textbf{n}/N$$\end{document} .

Let $A$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{A}$$\end{document} be a matrix of zeroes and ones, so that $A^{T} m$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{A}^\textrm{T}\textbf{m}$$\end{document} consists of the relevant marginals from the contingency table. A CMM is defined by constraints of the form

(2)

\begin{matrix} f (A^{T} m) = Z β, \end{matrix}

where $f$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{f}$$\end{document} is an appropriate function, $Z$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{Z}$$\end{document} is a design matrix of full column rank, and $β$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\upbeta }}$$\end{document} is a vector of parameters. For estimation purposes, parameter $β$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\upbeta }}$$\end{document} is eliminated from the equation as follows. Let $B$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{B}$$\end{document} be the orthogonal complement of the column space spanned by the columns of $Z$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{Z}$$\end{document} (i.e., $B^{T} Z = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{B}^\textrm{T}\textbf{Z} = \textbf{0}$$\end{document} and the concatenated matrix $(B Z)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\textbf{B}\,\,\, \textbf{Z})$$\end{document} is square and non-singular). By pre-multiplying both sides of Eq. 2 by $B^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{B}^\textrm{T}$$\end{document} , the CMM is written as a set of constraints:

(3)

\begin{matrix} B^{T} f (A^{T} m) = B^{T} Z β = 0 . \end{matrix}

Note that parameter $β$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\upbeta }}$$\end{document} can be obtained from Eq. 2 by

(4)

\begin{matrix} β = {(Z^{T} Z)}^{- 1} Z^{T} f (A^{T} m) . \end{matrix}

The constraint formulation $B^{T} f (A^{T} m) = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{B}^\textrm{T} \textbf{f}(\textbf{A}^\textrm{T}\textbf{m}) = \textbf{0}$$\end{document} (cf. Eq. 3) is computationally convenient since it allows the Lagrange multiplier technique to be used, and asymptotic theory has been developed using this formulation (Aitchison & Silvey, Reference Aitchison and Silvey1958, Lang, Reference Lang2005). In addition, the parameter formulation $f (A^{T} m) = Z β$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{f}(\textbf{A}^\textrm{T}\textbf{m}) = \textbf{Z} {\varvec{\upbeta }}$$\end{document} (Eq. 2) is not possible if $B^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{B}^\textrm{T}$$\end{document} is of full column rank because $Z$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{Z}$$\end{document} , the orthogonal complement of $B$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{B}$$\end{document} , does not exist. Therefore, the parameter formulation of CMMs will be disregarded from here on.

For notational convenience, we can replace $B^{T} f (A^{T} m)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{B}^\textrm{T} \textbf{f}(\textbf{A}^\textrm{T}\textbf{m})$$\end{document} by $g (m)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{g}(\textbf{m})$$\end{document} . So, the shortest notation for a CMM is

(5)

\begin{matrix} g (m) = 0 . \end{matrix}

Let D be the number of constraints in Eq. 5; that is, the length of vector $g (m)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{g}(\textbf{m})$$\end{document} . The fit of the CMM can be investigated by comparing $n$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{n}$$\end{document} and the ML estimate under the model, $\hat{m}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\textbf{m}}$$\end{document} , using a likelihood ratio test statistic ( $G^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G^2$$\end{document} ) or Pearson’s Chi-square test statistic ( $X^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X^2$$\end{document} ), which have an asymptotic Chi-square distribution with D degrees of freedom if the model is true. Example 1 shows a simple CMM following the build up in Eqs. 2, 3, 4, and 5, whereas Example 2 shows a CMM that has been used in psychometrics.

Example 1

Consider $n$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{n}$$\end{document} in Eq. 1. Suppose that we want to fit the CMM that prescribes marginal homogeneity: $m_{1}^{a} = m_{1}^{b} = m_{1}^{c}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m^{a}_1 = m^{b}_1 = m^{c}_1$$\end{document} (and consequently, $m_{0}^{a} = m_{0}^{b} = m_{0}^{c}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m^{a}_0 = m^{b}_0 = m^{c}_0$$\end{document} ). First, pre-multiplying $m$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{m}$$\end{document} by design matrix $A^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{A}^\textrm{T}$$\end{document} (Eq. 2) yields the required margins; that is,

(6)

\begin{matrix} A^{T} m = (\begin{matrix} 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 \\ 0 & 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 \end{matrix}) \cdot (\begin{matrix} m_{000}^{abc} \\ m_{001}^{abc} \\ m_{010}^{abc} \\ m_{011}^{abc} \\ m_{100}^{abc} \\ m_{101}^{abc} \\ m_{110}^{abc} \\ m_{111}^{abc} \end{matrix}) = (\begin{matrix} m_{1}^{a} \\ m_{1}^{b} \\ m_{1}^{c} \end{matrix}) . \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \textbf{A}^\textrm{T}\textbf{m} = \left( \begin{array}{rrrrrrrr} 0 &{} 0 &{} 0 &{} 0 &{} 1 &{} 1 &{} 1 &{} 1\\ 0 &{} 0 &{} 1 &{} 1 &{} 0 &{} 0 &{} 1 &{} 1\\ 0 &{} 1 &{} 0 &{} 1 &{} 0 &{} 1 &{} 0 &{} 1\\ \end{array} \right) \cdot \left( \begin{array}{r} m^{abc}_{000}\\ m^{abc}_{001}\\ m^{abc}_{010}\\ m^{abc}_{011}\\ m^{abc}_{100}\\ m^{abc}_{101}\\ m^{abc}_{110}\\ m^{abc}_{111}\\ \end{array} \right) = \left( \begin{array}{r} m^{a}_{1}\\ m^{b}_{1}\\ m^{c}_{1} \end{array} \right) . \end{aligned}$$\end{document}

Function $f$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{f}$$\end{document} (Eq. 2) is the identity function, so $f (A^{T} m) = A^{T} m = {(m_{1}^{a} m_{1}^{b} m_{1}^{c})}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{f}(\textbf{A}^\textrm{T}\textbf{m}) = \textbf{A}^\textrm{T}\textbf{m} = (m_1^a~m_1^b~m_1^c)^\textrm{T}$$\end{document} . To write the CMM as a set of constraints, $f (A^{T} m)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{f}(\textbf{A}^\textrm{T}\textbf{m})$$\end{document} is pre-multiplied by constraint matrix $B^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{B}^\textrm{T}$$\end{document} (cf. Eq. 3, left-hand side), and set to zero, yielding

(7)

\begin{matrix} B^{T} f (A^{T} m) = (\begin{matrix} 1 & - 1 & 0 \\ 0 & 1 & - 1 \end{matrix}) \cdot (\begin{matrix} m_{1}^{a} \\ m_{1}^{b} \\ m_{1}^{c} \end{matrix}) = (\begin{matrix} m_{1}^{a} - m_{1}^{b} \\ m_{1}^{b} - m_{1}^{c} \end{matrix}) = (\begin{matrix} 0 \\ 0 \end{matrix}) . \end{matrix}

As the $3 \times 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$3\times 1$$\end{document} column vector $Z = (\frac{1}{\sqrt{3}}, \frac{1}{\sqrt{3}}, \frac{1}{\sqrt{3}})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{Z} = \left( \frac{1}{\sqrt{3}}~\frac{1}{\sqrt{3}}~ \frac{1}{\sqrt{3}}\right) $$\end{document} is the orthogonal complement of $B$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{B}$$\end{document} , with ${(Z^{T} Z)}^{- 1} = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\textbf{Z}^\textrm{T}\textbf{Z})^{-1} = 1$$\end{document} , parameter $β$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document} (which in this case is 1-dimensional) can be obtained by Eq. 4; that is,

(8)

\begin{matrix} β = {(Z^{T} Z)}^{- 1} Z^{T} f (A^{T} m) = 1 \cdot (\begin{matrix} \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} \end{matrix}) (\begin{matrix} m_{1}^{a} \\ m_{1}^{b} \\ m_{1}^{c} \end{matrix}) = \frac{m_{1}^{a} + m_{1}^{b} + m_{1}^{c}}{\sqrt{3}} . \end{matrix}

Conventional short notation $g (m) = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{g}(\textbf{m}) =\textbf{0}$$\end{document} (Eq. 5) is obtained by letting $g (m) = B^{T} f (A^{T} m)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{g}(\textbf{m}) = \textbf{B}^\textrm{T}\textbf{f}(\textbf{A}^\textrm{T} \textbf{m})$$\end{document} ; that is,

(9)

\begin{matrix} g (m) = (\begin{matrix} 1 & - 1 & 0 \\ 0 & 1 & - 1 \end{matrix}) \cdot (\begin{matrix} 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 \\ 0 & 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 \end{matrix}) \cdot & m \\ = (\begin{matrix} 0 & 0 & - 1 & - 1 & 1 & 1 & 0 & 0 \\ 0 & - 1 & 0 & - 1 & 1 & 0 & 1 & 0 \end{matrix}) & (\begin{matrix} m_{000}^{abc} \\ m_{001}^{abc} \\ m_{010}^{abc} \\ m_{011}^{abc} \\ m_{100}^{abc} \\ m_{101}^{abc} \\ m_{110}^{abc} \\ m_{111}^{abc} \end{matrix}) = (\begin{matrix} 0 \\ 0 \end{matrix}) . \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \textbf{g}(\textbf{m}) = \left( \begin{array}{rrr} 1 &{} -1 &{} 0 \\ 0 &{} 1 &{} -1 \end{array} \right) \cdot \left( \begin{array}{rrrrrrrr} 0 &{} 0 &{} 0 &{} 0 &{} 1 &{} 1 &{} 1 &{} 1\\ 0 &{} 0 &{} 1 &{} 1 &{} 0 &{} 0 &{} 1 &{} 1\\ 0 &{} 1 &{} 0 &{} 1 &{} 0 &{} 1 &{} 0 &{} 1\\ \end{array} \right) \cdot&\textbf{m} \nonumber \\ = \left( \begin{array}{rrrrrrrr} 0 &{} 0 &{}-1 &{}-1 &{} 1 &{} 1 &{} 0 &{} 0\\ 0 &{}-1 &{} 0 &{}-1 &{} 1 &{} 0 &{} 1 &{} 0\\ \end{array} \right)&\left( \begin{array}{r} m^{abc}_{000}\\ m^{abc}_{001}\\ m^{abc}_{010}\\ m^{abc}_{011}\\ m^{abc}_{100}\\ m^{abc}_{101}\\ m^{abc}_{110}\\ m^{abc}_{111}\\ \end{array} \right) = \left( \begin{array}{r} 0\\ 0\\ \end{array} \right) . \end{aligned}$$\end{document}

The vector of expected frequencies that is closest (in an ML sense) to $n$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{n}$$\end{document} (Eq. 1) and meets the requirement of Eq. 9 is

(10)

\begin{matrix} \hat{m} = (\begin{matrix} 20.000 \\ 14.397 \\ 8.060 \\ 11.695 \\ 0.000 \\ 19.755 \\ 26.092 \\ 30.000 \end{matrix}) . \end{matrix}

Comparing $n$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{n}{} $$\end{document} given in Eq. 1 and $\hat{m}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\textbf{m}}{} $$\end{document} given in Eq. 10 yields $G^{2} = 2.6107$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G^2 = 2.6107$$\end{document} (df = 2, $p = . 2711) . UsinganominalTypeIerrorrateof α = . 05$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p=.2711).{ UsinganominalTypeIerrorrateof}\alpha =.05$$\end{document} , the hypothesis of marginal homogeneity should not be rejected.

Example 2.

Item-scalability coefficient $H_{j} (j = 1, \dots, J$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_j (j=1,\dots ,J$$\end{document} ) is used in Mokken scale analysis (e.g., Mokken, Reference Mokken1971; Sijtsma & Van der Ark, Reference Sijtsma and Van der Ark2017) and expresses the strength of the relationship between item $j$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j$$\end{document} and the other items in the test, comparable with a regression coefficient in a regression model. One of the criteria of a Mokken scale is that all coefficients H_j are greater than some lower bound c. The lower bound that is used as a default is c = 0.30 (Sijtsma & Molenaar Reference Sijtsma and Molenaar2002). Hence, a relevant question is whether all H_j > 0.30. Coefficients H_j are not independent from each other, and CMMs can be used to control for this nuisance dependence and test all coefficients simultaneously.

Under the assumption that the items are numbered in an ascending order of their probability of answering the item correctly (i.e., item 1 is the least popular or most difficult item, item J the most popular or least difficult item), item-scalability coefficients $H_{j}, j = 1, \dots, J$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_j,\,j = 1, \dots , J$$\end{document} for dichotomous items (Mokken, Reference Mokken1971, p. 151) are defined as

(11)

\begin{matrix} H_{j} = 1 - \frac{N (\sum_{i = 1}^{j - 1} m_{01}^{ij} + \sum_{i = j + 1}^{J} m_{01}^{ji})}{\sum_{i = 1}^{j - 1} m_{0}^{i} m_{1}^{j} + \sum_{i = j + 1}^{J} m_{0}^{j} m_{1}^{i}} . \end{matrix}

Consider the observed frequencies in Eq. 1. Let $H = (H_{a}, H_{b}, H_{c})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{H} = (H_a, H_b, H_c)$$\end{document} be a vector containing the item-scalability coefficients of items $a, b, and c$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a,\,b,\,{ and}c$$\end{document} . Equation 11 shows that H is a function of m. The constraints defines a CMM (Eq. 5); we refer to Van der Ark et al. (Reference Van der Ark, Croon and Sijtsma2008) for computational details.

The sample values of $H_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_j$$\end{document} for the vector of observed frequencies in Eq. 1 are ${\hat{H}}_{a} = 0.231, {\hat{H}}_{b} = 0.164, and {\hat{H}}_{c} = 0.055$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{H}_a =0.231,\,\widehat{H}_b =0.164,\,{ and}\widehat{H}_c =0.055$$\end{document} . Fitting the CMM that all item-scalability coefficients equal 0.3 to the data in Eq. 1 yields . Using a nominal Type I error rate of α = 0.05, the hypothesis H = (0.3, 0.3, 0.3)^T should be rejected.

2. Estimation of CMMs

2.1. ML and MEL Estimation

Assuming that the frequency vector $n$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{n}{} $$\end{document} follows a multinomial distribution, the likelihood function is

(12)

\begin{matrix} L (m | n) = \frac{N!}{\prod_{i = 1}^{L} n_{i}!} \prod_{i = 1}^{L} {(\frac{m_{i}}{N})}^{n_{i}} \propto \prod_{i = 1}^{L} m_{i}^{n_{i}} . \end{matrix}

The maximum likelihood estimate maximizes L (m|n) subject to the model constraint

(13)

\begin{matrix} g (m) = 0 \end{matrix}

and the multinomial constraint

(14)

\begin{matrix} \sum_{i} m_{i} = N = \sum_{i} n_{i} . \end{matrix}

In Appendix B, an algorithm for finding is given.

For multinomial distributions, MEL estimation is similar to ML estimation with the difference that all cells for which ni = 0 are treated as structural zeros. The MEL estimate of m maximizes L (m|n) subject to Eqs. 13 and 14 and the structural-zero constraint

(15)

\begin{matrix} m_{i} = 0 if n_{i} = 0 . \end{matrix}

MEL estimation can be done using the same algorithm as ML estimation because the cells i for ni = 0 can simply be left out of the estimation procedure. For MEL estimation, fewer cells need to be estimated, which makes the procedure faster and more suitable f or large contingency tables than ML estimation.

In general, a superscripted asterisk indicates that the cells i for which ni = 0 are left out; that is L^∗ is the number of cells for which ni > 0, n^∗ is the vector of length L^∗ of nonzero observed frequencies (i.e., n^∗ is the vector containing those ni that are greater than zero). The corresponding expected frequencies and expected probabilities are denoted m^∗ and π^∗, respectively, and g^∗ (m^∗) equals g(m) with the elements of m corresponding to zero observed cells set to zero. Example 3 shows an illustration of MEL estimation.

Example 3

This example illustrates MEL estimation of the CMM in Example 1. For the vector of observed frequencies in Eq. 1,

(16)

\begin{matrix} n^{*} = (\begin{matrix} n_{000}^{abc} \\ n_{001}^{abc} \\ n_{010}^{abc} \\ n_{011}^{abc} \\ n_{101}^{abc} \\ n_{110}^{abc} \\ n_{111}^{abc} \end{matrix}) = (\begin{matrix} 20 \\ 15 \\ 10 \\ 15 \\ 15 \\ 25 \\ 30 \end{matrix}) . \end{matrix}

In Eq. 16, has been omitted, which implies that is fixed to zero, and not considered in the estimation procedure. The CMM in Eq. 9 under MEL reduces to

(17)

\begin{matrix} g^{*} (m^{*}) = (\begin{matrix} 0 & 0 & - 1 & - 1 & 1 & 0 & 0 \\ 0 & - 1 & 0 & - 1 & 0 & 1 & 0 \end{matrix}) (\begin{matrix} m_{000}^{abc} \\ m_{001}^{abc} \\ m_{010}^{abc} \\ m_{011}^{abc} \\ m_{101}^{abc} \\ m_{110}^{abc} \\ m_{111}^{abc} \end{matrix}) = (\begin{matrix} 0 \\ 0 \end{matrix}) . \end{matrix}

Comparing $n^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{n}^*$$\end{document} given in Eq. 16 and

\begin{matrix} {\hat{m}}^{*} = (\begin{matrix} 20.000 \\ 14.397 \\ 8.060 \\ 11.695 \\ 19.755 \\ 26.092 \\ 30.000 \end{matrix}) . \end{matrix}

yields $G^{2} = 2.611 (d f = 2, p = 0.271$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G^2 = 2.611 (df = 2, p=0.271$$\end{document} ). In this case, ML estimation (see Example 1) and MEL estimation provide identical expected frequencies and model fit, but this is not true in general.

2.2. The First- and Second-Order Estimation Problems for CMMs

Unfortunately, the support ${x_{1}, \dots, x_{N}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\textbf{x}_1,\ldots ,\textbf{x}_N\}$$\end{document} belonging to the empirical likelihood may be too small for the CMM to be estimated and to do inference. We identify two problems, which are described more formally and in some more generality in Appendix A. We say that the first-order estimation problem occurs if the equation $g^{*} (m^{*}) = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{g}^*(\textbf{m}^*)=\textbf{0}$$\end{document} does not have any solutions. This is also known as the empty set problem (Grendár & Judge, Reference Grendár and Judge2009). The second-order estimation problem occurs if the empirical likelihood support is too small to be able to estimate the covariance matrix of the estimated marginal parameters. Occurrence of the first-order problem implies occurrence of the second-order problem, and absence of the second-order problem implies absence of the first-order problem. If the second-order problem occurs, inference for the model is problematic. The first- and second-order estimation problems can occur for MEL estimation with sparse observed contingency tables, as illustrated next.

Example 4

Consider a $2 \times 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2\times 2$$\end{document} contingency table and let

\begin{matrix} g (m) = (m_{0 +} - m_{1 +}) - (m_{+ 0} - m_{+ 1}) = (\begin{matrix} 0 & 1 & - 1 & 0 \end{matrix}) (\begin{matrix} m_{00} \\ m_{01} \\ m_{10} \\ m_{11} \end{matrix}) = 0 . \end{matrix}

Suppose we observe

\begin{matrix} (\begin{matrix} n_{00} \\ n_{01} \\ n_{10} \\ n_{11} \end{matrix}) = (\begin{matrix} 0 \\ 1 \\ 0 \\ 0 \end{matrix}) . \end{matrix}

Then, it can be verified that $g^{*} (m^{*}) = m_{01} \times 1 = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{g}^*(\textbf{m}^*)= m_{01} \times 1 = 0$$\end{document} does not have any solutions; that is, the first-order estimation problem (or empty set problem) occurs, and hence so does the second-order one. If, on the other hand, we observed

\begin{matrix} (\begin{matrix} n_{00} \\ n_{01} \\ n_{10} \\ n_{11} \end{matrix}) = (\begin{matrix} 1 \\ 0 \\ 0 \\ 1 \end{matrix}), \end{matrix}

then the first-order problem does not occur. Assuming Poisson sampling for simplicity, we have $var (g (n) - g (m)) = 4 m_{01} + 4 m_{10}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${ \text{ var }}(\textbf{g}(\textbf{n})-\textbf{g}(\textbf{m}))=4m_{01}+4m_{10}$$\end{document} . Under empirical likelihood, this is zero; that is, the variance of the marginal parameter cannot be estimated, and the second-order problem occurs.

Example 5

Consider dichotomous variables $X_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1$$\end{document} and $X_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_2$$\end{document} , and let the CMM be $H_{1} = 0.3$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_1 =0.3$$\end{document} . Let $n = {(n_{00}, n_{01}, n_{10}, n_{11})}^{T} = {(30, 0, 30, 30)}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{n} = (n_{00}, n_{01}, n_{10}, n_{11})^\textrm{T} = (30,0,30,30)^\textrm{T}$$\end{document} , hence $n^{*} = {(n_{00}, n_{10}, n_{11})}^{T} = {(30, 30, 30)}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{n}^* = (n_{00}, n_{10}, n_{11})^\textrm{T} = (30,30,30)^\textrm{T}$$\end{document} . It follows that ${\bar{X}}_{1} = \frac{2}{3}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\overline{X}_1 = \frac{2}{3}$$\end{document} and ${\bar{X}}_{2} = \frac{1}{3}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\overline{X}_2 = \frac{1}{3}$$\end{document} . Under the assumption that $E (X_{1}) > E (X_{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E(X_1) > E(X_2)$$\end{document} , Eq. 11 reduces to

(18)

\begin{matrix} H_{1} = H_{2} = 1 - \frac{N \times m_{01}}{m_{0} \times m_{1}} . \end{matrix}

Frequency $n_{01}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n_{01}$$\end{document} is not observed, so due to the structural-zero constraint (Eq. 15), MEL estimation produces ${\hat{m}}_{01} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{m}_{01} = 0$$\end{document} by definition. As a result, the ratio on the right-hand side of Eq. 18 equals zero, and $H_{1} = H_{2} = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_1 = H_2 = 1$$\end{document} . Hence, there exists no $m^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{m}^*$$\end{document} satisfying $H_{1} = 0.3$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_1 =0.3$$\end{document} .

2.3. MAEL Estimation

A solution to the first- and second-order estimation problems is obtained by augmenting the empirical likelihood support with a number of support cells, which we call maximum augmented empirical likelihood (MAEL) estimation. The question arises which cells to add. For CMMs, there is a fairly natural choice, in particular, suppose the order k marginal distributions are of interest for a particular CMM. Then clearly, to avoid the first-order estimation problem, the support must contain for every marginal cell at least one cell in the contingency table contributing to it. Hence, this is the least augmentation that should be done for the empirical likelihood support. To avoid the second-order estimation problem, note that the covariance between observed marginals is a function of higher-order marginals, for example,

\begin{matrix} cov (n_{i + +}, n_{+ j +}) = m_{i j +} - m_{i + +} m_{+ j +} / N, \\ cov (n_{i j + +}, n_{+ k l +}) = δ_{jk} m_{i j l +} - m_{i j + +} m_{+ k l +} / N \end{matrix}

\begin{matrix} cov (n_{+ i j + + +}, n_{+ + + + k l}) = m_{+ i j + k l} - m_{+ i j + + +} m_{+ + + + k l} / N \end{matrix}

where a plus in the subscript denotes summation over that subscript. If the relevant higher-order marginals are estimable, the second-order estimation problem can typically be avoided.

If the second-order estimation problem occurs, it can be resolved by augmenting the empirical likelihood support so that each of the relevant higher-order marginals has one or more cells contributing to it. We found that the methodology is not affected much by which cells were chosen. In practice, we randomly added cells, which gave good results.

The notation is as follows. For ML estimation, all L cells of $n$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{n}$$\end{document} are considered, and for MEL estimation, only the $L^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L^*$$\end{document} cells with a positive observed count, collected in $n^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{n}^*$$\end{document} , are considered. MAEL can be regarded as an intermediate estimation method, considering the $L^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L^*$$\end{document} cells with a positive observed count plus a number of cells with zero observed count to avoid the first- and second-order estimation problems. Let $L^{†}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L^\dagger $$\end{document} be such that $L^{*} \leq L^{†} \leq L$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L^*\le L^\dagger \le L$$\end{document} , and let $n^{†}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{n}^\dagger $$\end{document} , $m^{†}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{m}^\dagger $$\end{document} , and $π^{†}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{\uppi }}}^\dagger $$\end{document} denote the augmented vector of observed frequencies, expected frequencies, and probabilities, respectively.

Example 6 explores some possibilities to augment the empirical likelihood support for a small example, illustrating that the fit of a CMM decreases dramatically when too few cells are added to $n^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{n}^*$$\end{document} .

Example 6

Suppose that

(19)

\begin{matrix} n = (\begin{matrix} n_{000}^{abc} \\ n_{001}^{abc} \\ n_{010}^{abc} \\ n_{011}^{abc} \\ n_{100}^{abc} \\ n_{101}^{abc} \\ n_{110}^{abc} \\ n_{111}^{abc} \end{matrix}) = (\begin{matrix} 0 \\ 0 \\ 0 \\ 0 \\ 65 \\ 0 \\ 65 \\ 0 \end{matrix}), \end{matrix}

and suppose the marginal homogeneity CMM in Eq. 9 is the CMM of interest. The ML estimate is $\hat{m} = {(0, 32.5, 0, 32.5, 32.5, 0, 32.5, 0)}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\textbf{m}} = (0, 32.5, 0, 32.5, 32.5, 0, 32.5, 0)^\textrm{T}$$\end{document} with $G^{2} = 180.22$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G^2 = 180.22$$\end{document} (df = 2). For MEL estimation, the second-order estimation problems occur. Because

(20)

\begin{matrix} n^{*} = (\begin{matrix} n_{100}^{abc} \\ n_{110}^{abc} \end{matrix}) = (\begin{matrix} 65 \\ 65 \end{matrix}), \end{matrix}

Eq. 9 reduces to

(21)

\begin{matrix} g^{*} (m^{*}) = (\begin{matrix} 1 & 0 \\ 1 & 1 \end{matrix}) (\begin{matrix} m_{100}^{abc} \\ m_{110}^{abc} \end{matrix}) = (\begin{matrix} 0 \\ 0 \end{matrix}) . \end{matrix}

The rows of the design matrix in Eq. 21 contain only nonnegative elements, and the constraints imply that $m_{100}^{abc} = m_{110}^{abc} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m^{abc}_{100} = m^{abc}_{110} = 0$$\end{document} . But since $n_{100}^{abc} > 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n^{abc}_{100} > 0$$\end{document} and $n_{110}^{abc} > 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n^{abc}_{110} > 0$$\end{document} , the likelihood function is zero whenever Eq. 21 holds; that is, $G^{2} = \infty$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G^2=\infty $$\end{document} .

The problem of a zero likelihood can be circumvented by adding $n_{011}^{abc}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n^{abc}_{011}$$\end{document} to $n^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{n}^*$$\end{document} . Then we obtain

(22)

\begin{matrix} \begin{matrix} n^{†} = (\begin{matrix} n_{011}^{abc} \\ n_{100}^{abc} \\ n_{110}^{abc} \end{matrix}) = (\begin{matrix} 0 \\ 65 \\ 65 \end{matrix}) & and g^{†} (m^{†}) = (\begin{matrix} - 1 & 1 & 0 \\ - 1 & 1 & 1 \end{matrix}) (\begin{matrix} m_{011}^{abc} \\ m_{100}^{abc} \\ m_{110}^{abc} \end{matrix}) = 0; \end{matrix} \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \begin{array}{ccc} \textbf{n}^\dagger = \left( \begin{array}{r} n^{abc}_{011}\\ n^{abc}_{100}\\ n^{abc}_{110}\\ \end{array} \right) = \left( \begin{array}{r} 0\\ 65\\ 65\\ \end{array} \right) &{} \text{ and } \textbf{g}^\dagger (\textbf{m}^\dagger ) = \left( \begin{array}{rrr} -1 &{} 1 &{} 0 \\ -1 &{} 1 &{} 1 \\ \end{array} \right) \left( \begin{array}{c} m^{abc}_{011}\\ m^{abc}_{100}\\ m^{abc}_{110}\\ \end{array} \right) = 0; \end{array} \end{aligned}$$\end{document}

yielding ${\hat{m}}^{†} = {(65, 65, 0)}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\textbf{m}}^\dagger = (65, 65, 0)^\textrm{T}$$\end{document} with $G^{2} = 1906.93$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G^2 = 1906.93$$\end{document} $(d f = 2)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(df = 2)$$\end{document} . Neither ML nor MAEL fit the data well but $G^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G^2$$\end{document} is almost 10 times larger for MAEL than for ML. Including more cells may decrease the difference in global fit between MAEL and ML. The second-order estimation problem can be circumvented if $n_{000}^{abc}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n^{abc}_{000}$$\end{document} , $n_{011}^{abc}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n^{abc}_{011}$$\end{document} , and $n_{101}^{abc}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n^{abc}_{101}$$\end{document} are added to $n^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{n}^*$$\end{document} . In this way, $n^{†}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{n}^\dagger $$\end{document} includes all bivariate margins:

(23)

\begin{matrix} \begin{matrix} n^{†} = (\begin{matrix} n_{000}^{abc} \\ n_{011}^{abc} \\ n_{100}^{abc} \\ n_{101}^{abc} \\ n_{110}^{abc} \end{matrix}) = (\begin{matrix} 0 \\ 0 \\ 65 \\ 0 \\ 65 \end{matrix}), & and & g^{†} (m^{†}) = (\begin{matrix} 0 & - 1 & 1 & 1 & 0 \\ 0 & - 1 & 1 & 0 & 1 \end{matrix}) (\begin{matrix} m_{000}^{abc} \\ m_{011}^{abc} \\ m_{100}^{abc} \\ m_{101}^{abc} \\ m_{110}^{abc} \end{matrix}) = 0; \end{matrix} \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \begin{array}{ccc} \textbf{n}^{\dagger } = \left( \begin{array}{c} n^{abc}_{000}\\ n^{abc}_{011}\\ n^{abc}_{100}\\ n^{abc}_{101}\\ n^{abc}_{110}\\ \end{array} \right) = \left( \begin{array}{r} 0\\ 0\\ 65\\ 0\\ 65\\ \end{array} \right) , &{} \text{ and } &{} \textbf{g}^\dagger (\textbf{m}^\dagger ) = \left( \begin{array}{rrrrr} 0 &{} -1 &{} 1 &{} 1 &{} 0 \\ 0 &{} -1 &{} 1 &{} 0 &{} 1 \\ \end{array} \right) \left( \begin{array}{c} m^{abc}_{000}\\ m^{abc}_{011}\\ m^{abc}_{100}\\ m^{abc}_{101}\\ m^{abc}_{110}\\ \end{array} \right) = 0; \end{array} \end{aligned}$$\end{document}

yielding ${\hat{m}}^{†} = {(0, 54.167, 32.5, 21.67, 21.67)}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\textbf{m}}^\dagger = (0, 54.167, 32.5, 21.67, 21.67)^\textrm{T}$$\end{document} with $G^{2} = 232.92$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G^2 = 232.92$$\end{document} (df = 2). $G^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G^2$$\end{document} is now much closer to $G^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G^2$$\end{document} of the ML solution.

3. Comparing ML, MEL, and MAEL

Two studies compared the ML, MEL, and MAEL estimation procedures for three CMMs relevant for psychology and educational sciences:

1. Model “Alpha". Kuijpers et al. (Reference Kuijpers, Van der Ark and Croon2013) showed that testing whether Cronbach’s alpha ( $α$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} ) equals a certain benchmark can be done using a CMM with 1 degree of freedom. Model “Alpha" is $α = . 8$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.8$$\end{document} , because .8 is an arbitrary but commonly used benchmark to assess the quality of the test-score reliability (see, e.g., Nunnally, Reference Nunnally1978).
2. Model“ $H_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_j$$\end{document} ". For a set of J items, Van der Ark et al. (Reference Van der Ark, Croon and Sijtsma2008) showed that testing whether each item-scalability coefficients $H_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_j$$\end{document} ( $j = 1, . . ., J$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j = 1,..., J$$\end{document} ) equals the researcher-specified lower-bound values c can be done using a CMM with J degree of freedom. Let $H = {(H_{1}, . . ., H_{J})}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{H} = (H_1,..., H_J)^T$$\end{document} . Model “ $H_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_j$$\end{document} ” is $H = . 3 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{H} =.3 \, \textbf{1}$$\end{document} , as.3 is the default value of for lower bound c provided by software programs for Mokken scale analysis.
3. Model “Mean”. Bergsma et al. (Reference Bergsma, Croon and Hagenaars2009, pp. 185–188) showed that testing equality of means of J variables can be done using a CMM with $J - 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J-1$$\end{document} degrees of freedom. Investigating equality of means may be useful when investigating whether a set of items are parallel (e.g., Lord & Novick, Reference Lord and Novick1968, pp. 47–50)

Study 1 is an exploratory simulation study to investigate the convergence rate and computation time under various settings. The tables are small to allow ML estimation. In Study 2, we investigated the Type I error rate of CMMs estimated with MAEL for realistic numbers of items in psychological and educational test data. We considered tables ranging from small (16 cells) to enormous ( $1.1 \times 10^{12}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1.1 \times 10^{12}$$\end{document} ). In addition, we investigated bias and variance of parameter $β$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\upbeta }}$$\end{document} . ML estimation was not considered because it is feasible only for small tables, and MEL estimation was not considered because in most cases the algorithm runs into singularity problems and, consequently, does not converge.

3.1. Population Models and Estimation

Both Study 1 and Study 2 required population models (i.e., the vector of probabilities, $π$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{\uppi }}}$$\end{document} ) that comply with the constraints of the CMM under consideration (i.e., “Model Alpha”, “Model $H_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_j$$\end{document} ”, or “Model Mean” for J items). The population models were constructed as follows. First, we constructed a two-parameter logistic model (2PLM), a popular item response theory model (Birnbaum, Reference Birnbaum, Lord and Novick1968), for which the location and discrimination parameters were selected (by trial and error) such that data generated from that 2PLM were close, in a loose sense, to the requirements of the CMM under consideration. Next, we generated 1000 response patterns from the 2PLM. Then, using ML (Study 1) or MAEL (Study 2), the CMM under consideration was estimated for the generated data, and the resulting estimated probabilities were used as the probabilities $π$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{\uppi }}}$$\end{document} of the data generating model. Finally, N observations were sampled from $π$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{\uppi }}}$$\end{document} . This data-generating procedure yields expected frequencies $m$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{m}$$\end{document} that meet the constraints of the CMM of interest and have a relatively close fit to the 2PLM.

In Study 1, a certain percentage of the probabilities from the population model was deliberately set to zero, so as to create conditions with many zero cells. The cells in $π$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{\uppi }}}$$\end{document} that were set to zero were randomly selected, and afterwards $π$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{\uppi }}}$$\end{document} was rescaled. Note that setting random cells to zero is useful to investigate convergence, but makes investigation of Type I error and bias impossible.

The CMMs under consideration were estimated using the generated data as input, employing the R package cmm (Bergsma & Van der Ark, Reference Bergsma and Van der Ark2023), which offers MAEL estimation starting from version 1.0. All CMMs received uniform starting values and a maximum of 1,000 iterations. The code is available on the Open Science Framework at https://osf.io/yz8rm/).

3.2. Study 1: convergence Rates and Computation Times

For $N = 50$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N = 50$$\end{document} , we investigated the effect of four independent variables on convergence rate and computation time. Estimation Procedure had three levels: ML, MEL, and MAEL. Type of CMM had three levels: “Model Alpha”, “Model $H_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_j$$\end{document} ”, and “Model Mean”. For “Model Alpha” the criterion value was set to the sample value plus. 2; and for “Model $H_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_j$$\end{document} ” the criterion value was set to the average of the sample $H_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_j$$\end{document} values. For convenience, the criterion values depend on the sample values. Because Study 1 investigated only computation time and convergence rate, sample-dependent criterion values are not a problem. Minimum Percentage Cells with Zero Observed Frequency (U) had three levels: 0% (none), 25% (small percentage), and 75% (large percentage). Number of items (J) had two levels: 4 dichotomous items, yielding $L = 16$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L = 16$$\end{document} possible response patterns, and 8 items, yielding $L = 256$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L = 256$$\end{document} response patterns. The number of items was kept small to allow for ML estimation. Hence, we had a 3 (Estimation Method) $\times$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} 3 (CMM) $\times$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} 3 (U) $\times$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} 2 (J) experimental design with a total of 54 cells. Each cell in the experimental design was replicated 1,000 times. For a small extra design (100 replications), we estimated CMMs with 10 ( $L = 1024$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L = 1024$$\end{document} ) items to demonstrate the sharp increase in computation time.

Table 1 shows that for the smallest tables ( $J = 4$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J = 4$$\end{document} and $J = 8$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J = 8$$\end{document} ), both ML and MAEL almost always converged, whereas MEL often broke down for models “ $H_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_j$$\end{document} ” and “Mean”. For $J = 10$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J = 10$$\end{document} , ML ran into memory problems for models “ $H_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_j$$\end{document} ” and “Mean”, whereas MEL almost always broke down. For Model “Alpha”, convergence results were satisfactory for all three estimation methods.

Table 1. Convergence rates (percentage) and median computation times in seconds for ML, MEL, and MAEL, for three different CMMs, two numbers of items (J), and three percentages of unobservable response patterns (U) based on $1, 000$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1,\!000$$\end{document} ( $J = 4, 8$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J = 4, 8$$\end{document} ) and 100 ( $J = 10$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${ J} = 10$$\end{document} ) replications.

The distribution of the computation time was positively skewed. Therefore, we reported the median rather than the mean computation time. Naturally, MAEL and MEL were at least as fast as ML: Ranging from just as fast to more than 200 times faster. As the number of items increased, the computation time increased dramatically (Table 1, columns 4–6). This was especially true for ML estimation. For 4 and 8 items ( $L = 256$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L=256$$\end{document} ), but the computation time was still reasonable in all sample (never longer than 100 s), but for 10 items ( $L = 1024$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L=1024$$\end{document} ) some runs took up to 30 min for Model “Alpha”.

The results show that even for moderately large tables, ML may run into memory problems. Moreover, the results show that the first- and second-order estimation problems are omnipresent so that MEL often breaks down. This leaves MAEL as the viable candidate for estimating CMMs for large sparse contingency tables.

3.3. Study 2: Type I Error Rate

For MAEL estimation, we investigated the effect of the type of CMM, the number of items, and sample size on the Type I error rate and the bias and standard deviation of model parameter $β$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\upbeta }}$$\end{document} . (Eq. 2). As in Study 1, Type of CMMs had three levels: “Model Alpha” (the criterion value was set to 0.8), “Model $H_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_j$$\end{document} ” (the criterion value was set to 0.3), and “Model Mean”. For “Model Alpha” and “Model $H_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_j$$\end{document} ” parameter $β$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\upbeta }}$$\end{document} is fixed to $β = 0.8$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta =0.8$$\end{document} and $β = 1_{J} \cdot 0.3$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\upbeta }}=\textbf{1}_J \cdot 0.3$$\end{document} , respectively. Hence, bias and standard deviation of $β$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\upbeta }}$$\end{document} were investigated only for Model “Mean”, where $β$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document} equals the overall mean item score. Moreover, we studied four levels of number of items: 4 ( $L = 16$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L = 16$$\end{document} ), 8 ( $L = 256$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L = 256$$\end{document} ), 20 ( $L = 1, 048, 576$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L = 1,048,576$$\end{document} ), and 40 ( $L \approx 1.1 \times 10^{12}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L \approx 1.1 \times 10^{12}$$\end{document} ); and three levels of sample size ( $N = 250$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N = 250$$\end{document} , $N = 500$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N = 500$$\end{document} , and $N = 1000$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=1000$$\end{document} ). Hence, we had a $3 (CMM) \times 4 (J) \times 3 (N)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$3 \text{(CMM) } \times 4~(J)~\times 3~(N)~$$\end{document} experimental design with a total of 36 cells. Each cell in the experimental design was replicated 10,000 times for $J = 4$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J = 4$$\end{document} and $J = 8$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J = 8$$\end{document} items and 1000 times for $J = 20$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J = 20$$\end{document} and $J = 40$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J = 40$$\end{document} items. The empirical Type I error rate over the replications was compared to the nominal Type I error rate of 0.05, the mean value of $\hat{β} - β$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\beta } - \beta $$\end{document} over replications was used to estimate the bias, and the standard deviation of $\hat{β}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\beta }$$\end{document} over replications was used as an estimate of the standard error of $\hat{β}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\beta }$$\end{document} .

Table 2 shows the Type I error rates for all cells in the design. In most cells, the Type I error rates are close to the nominal Type I error rate. For models with many degrees of freedom estimated using a relatively small sample size, the models are too liberal. For 40 items, models “ $H_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_j$$\end{document} ” and “Mean” have 40 and 39 degrees of freedom, respectively. For $N = 250$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=250$$\end{document} , this results in approximately 6 observations per degree of freedom. Hence, the poor performance is not so much due to the large table as due to the increase in degrees of freedom. Results are satisfactory if the sample size per degree of freedom exceeds 25 (see Fig. 1).

For Model “Mean”, the bias of $β$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\upbeta }}$$\end{document} (not tabulated) was negligible in all cases, and the estimated standard error (Table 3) behaved as expected; that is, if N doubles, the estimated standard error decreased approximately by a factor $\sqrt{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{2}$$\end{document} .

Table 2. Type I error rate for MAEL estimation of three different CMMs, four different numbers of items (J), and three different sample sizes (N), based on 1000 replications.

Note: A 95% confidence interval for the Type I error rate equals [0.036;0.064]. Values outside the 95% confidence interval are printed in boldface.

Figure 1. Type I error rates by the ratio of sample size and degrees of freedom in Study 2. Dashed lines are the limits of the 95% confidence interval of the Type I error rate due to Monte Carlo error.

Table 3. Estimated standard error of CMM-parameter estimate $\hat{β}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\beta }$$\end{document} for Model “Mean”, for four different numbers of items (J), and three different sample sizes(N), based on 1000 ( $J = 20$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J= 20$$\end{document} and $J = 40$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J=40$$\end{document} ) and 10, 000 ( $J = 4$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J= 4$$\end{document} and $J = 8$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J=8$$\end{document} ) replications.

4. Discussion

CMMs have potential for application to psychological data, but an important reason that this potential has so far not been realized may be that up to now ML estimation of CMMs could only be applied to contingency tables for a limited number of categorical variables (up to, say, 10–20 variables, depending on the number of categories per variable). The present paper shows that this limitation can be resolved by the newly introduced maximum augmented empirical likelihood (MAEL) estimation method, a procedure that considers all nonzero cells in the table (i.e., cells with at least one observation) and some well-chosen zero cells in the table (i.e., cells with no observations). MAEL can be thought of as lying in between maximum empirical likelihood (MEL) estimation, which considers only nonzero cells in the table and subsequently suffers from the first-order and second-order estimation problems, and maximum likelihood (ML), which considers all cells in the table and runs into memory problems if the table is large.

The asymptotic distribution of the ML estimators of marginal parameters is known (Lang Reference Lang2005), and depends only on the covariance matrix of the sample marginal distributions. In contrast to MEL, due to the augmentation step MAEL allows this covariance matrix to be estimated. Simulation study 2 shows this estimation is done sufficiently well in a number of practical settings, in particular, the asymptotic distribution of the ML estimators also provide a good approximation of the distribution of the MAEL estimators. The asymptotic distributions of ML and MAEL estimators are identical.

MAEL estimation has advantages compared to alternative methods which can be used to estimate CMMs for large contingency table, namely the weighted least squares method (Grizzle et al., Reference Grizzle, Starmer and Koch1969, a.k.a. the GSK-method), generalized estimating equations (GEEs, e.g., Qaqish & Liang, Reference Qaqish and Liang1992), and composite likelihood (e.g., Varin et al., Reference Varin, Reid and Firth2011). A comparison of GSK and GEE with ML estimation is given in Rudas and Bergsma (Reference Rudas, Bergsma, Kateri and Moustaki2023). All these four methods can be used to estimate CMMs for almost arbitrarily large contingency tables, but the only methods with guaranteed optimal asymptotic efficiency are MAEL and GSK. Unlike MAEL, however, GSK is sensitive to sparsity of the marginal distributions (Bergsma et al., Reference Bergsma, Croon and Hagenaars2013, see also the discussion of Berkson, Reference Berkson1980).

Like GEE and GSK, MAEL estimation is computationally fast, and like ML but unlike GEE, it is asymptotically efficient. Furthermore, MAEL is less sensitive to sparsity of the marginal distributions than GSK. Thus, MAEL seems to be the preferred method for estimating CMMs. Researchers should take heed that if the ratio of the sample size and degrees of freedom becomes too small (say less than 25), the Type I error rates may be too liberal. This is not a feature of MAEL per se, but for all models that are too complex for the number of observations. Composite likelihood estimation is a possibly attractive alternative for estimating CMMs, which was not considered in this study because the estimation procedures are not yet available for CMMs, whereas MAEL fits nicely in the ML framework and software that is already available for CMMs. In addition, composite likelihood is a quasi-likelihood method, and hence asymptotic efficiency is lost, whereas ML, and hence MAEL and MEL, are asymptotically efficient (Aitchison & Silvey Reference Aitchison and Silvey1958, Lang, Reference Lang2005).

Appendix A First- and Second-Order Estimation Problems

With $X$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{X}$$\end{document} a random variable, MEL can be used to make inferences on a Euclidean parameter $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{\uptheta }}}$$\end{document} of the distribution of $X$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{X}$$\end{document} , where $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{\uptheta }}}$$\end{document} is defined by an estimating equation of the form

(24)

\begin{matrix} E ψ (X, θ) = 0 \end{matrix}

for some function $ψ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{\uppsi }}}$$\end{document} . For example, if $ψ (x, θ) = x - θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{\uppsi }}}(\textbf{x},{{\varvec{\uptheta }}})=\textbf{x}-{{\varvec{\uptheta }}}$$\end{document} , then (24) implies that $θ = E X$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{\uptheta }}}=E\textbf{X}$$\end{document} . Denote the population value of $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{\uptheta }}}$$\end{document} by $θ_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{\uptheta }}}_0$$\end{document} . Suppose we have observed $x_{1}, \dots, x_{N}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{x}_1,\ldots ,\textbf{x}_N$$\end{document} , which are i.i.d. and distributed as $X$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{X}$$\end{document} . The MEL estimator $\hat{θ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{{{\varvec{\uptheta }}}}}$$\end{document} of $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{\uptheta }}}$$\end{document} defined by (24) solves the constrained optimization problem

\begin{matrix} max_{θ, π} \prod_{i = 1}^{n} π_{i} \end{matrix}

subject to

(25)

\begin{matrix} π_{i} \geq 0, \sum π_{i} = 1, \sum π_{i} ψ (x_{i}, θ) = 0 \end{matrix}

The first-order estimation problem occurs if (25) does not have a solution (This problem is also known as the empty set problem, see Grendár & Judge, Reference Grendár and Judge2009). The best known example is the case that $θ = E X$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{\uptheta }}}=E\textbf{X}$$\end{document} while the population mean lies outside the convex hull of ${x_{1}, \dots, x_{N}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\textbf{x}_1,\ldots ,\textbf{x}_N\}$$\end{document} (Qin and Lawless, Reference Qin and Lawless1994).

Let F be the distribution function of $X$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{X}$$\end{document} . Under some conditions on $ψ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{\uppsi }}}$$\end{document} and F, $\hat{θ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{{{\varvec{\uptheta }}}}}$$\end{document} has an asymptotic multivariate normal distribution, in particular,

\begin{matrix} \sqrt{n} (\hat{θ} - θ_{0}) \sim MVN (0, V_{F} (θ_{0})) \end{matrix}

where

\begin{matrix} V_{F} (θ) = Ψ_{F} {(θ)}^{- 1} W_{F} (θ) Ψ_{F} {(θ)}^{- 1} \end{matrix}

and

\begin{matrix} Ψ_{F} (θ) = \int (\frac{d ψ (x, θ)}{d θ^{T}}) d F (x) W_{F} (θ) = \int ψ (x, θ) ψ {(x, θ)}^{T} d F (x) \end{matrix}

The second-order estimation problem occurs if there does not exist a distribution function G with empirical support ${x_{1}, \dots, x_{N}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\textbf{x}_1,\ldots ,\textbf{x}_N\}$$\end{document} such that $V_{F} (θ_{0}) = V_{G} (θ_{0})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{V}_F({{\varvec{\uptheta }}}_0)=\textbf{V}_G({{\varvec{\uptheta }}}_0)$$\end{document} .

Example 7

To illustrate the second-order estimation problem, consider a $2 \times 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2\times 2$$\end{document} contingency table with cell probabilities $π_{ij} > 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{ij}>0$$\end{document} ( $i, j \in {0, 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i,j\in \{0,1\}$$\end{document} ), and let $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} be the log ratio of marginal odds; that is,

\begin{matrix} θ = log \frac{π_{1 +} / π_{2 +}}{π_{+ 1} / π_{+ 2}} \end{matrix}

(For details on how to define $ψ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{\uppsi }}}$$\end{document} such that this $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} is the solution of (24), see Owen, Reference Owen2001). If one observed marginal count is zero, then $θ = \pm \infty$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta =\pm \infty $$\end{document} under empirical likelihood; that is, the first-order estimation problem occurs since $- \infty < θ_{0} < \infty$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-\infty<\theta _0<\infty $$\end{document} . If the two observed off-diagonal cell counts are zero, then $θ = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta =0$$\end{document} under empirical likelihood and as a consequence $V_{G} (θ) = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_G(\theta )=0$$\end{document} for any distribution G with support the two diagonal cells. However, assuming no structural zeroes in the table, $V_{F} (θ_{0}) > 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_F(\theta _0)>0$$\end{document} , and therefore the second-order estimation problem occurs. In this case, the first-order estimation problem occurs in addition to the second-order one if $θ_{0} \neq 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _0\ne 0$$\end{document} .

A special case of the second-order estimation problem has been identified earlier by Bergsma et al. (Reference Bergsma, Croon and Van der Ark2012), who called it the zero-likelihood problem. It occurs if the empirical likelihood is zero for all solutions of (25). In this case, for any distribution G with support ${x_{1}, \dots, x_{N}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\textbf{x}_1,\ldots ,\textbf{x}_N\}$$\end{document} , $V_{G} (θ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{V}_G({{\varvec{\uptheta }}})$$\end{document} is a matrix of zeroes, unequal to $V_{F} (θ_{0})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{V}_F({{\varvec{\uptheta }}}_0)$$\end{document} ; hence, the second-order estimation problem occurs.

We propose a solution for both estimation problems by augmentation of the support of the empirical likelihood, resulting in an estimation procedure lying in a spectrum with ML at one extreme and MEL at the other, which we call maximum augmented empirical likelihood (MAEL) estimation.

B Algorithm for Maximum Likelihood Estimation

Under some regularity conditions, the maximum likelihood estimates under model (5) are a saddle point of the Lagrangian log-likelihood

(26)

\begin{matrix} L (m, μ, λ) = n^{T} log (m) - μ (1^{T} m - N) + λ^{T} g (m) . \end{matrix}

where $μ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document} and $λ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{\lambda }}}$$\end{document} are Lagrange multipliers. In Eq. 26, $n^{T} log (m)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{n}^\textrm{T} \log (\textbf{m})$$\end{document} is the unconstrained kernel of the log-likelihood, and the Lagrangian terms are added to satisfy the multinomial sampling constraint $\sum_{i} m_{i} = N$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum _i m_i = N$$\end{document} (Eq. 14) and the model constraint (Eq. 13). Bergsma (Reference Bergsma1997, pp. 89–95) developed a Fisher scoring algorithm to find the ML estimates of the constrained expected frequencies in $m$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{m}$$\end{document} (Eq. 5) or, equivalently, the constrained cell probabilities $π$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{\uppi }}}$$\end{document} . This algorithm is a modification of Lagrangian algorithms by Aitchison and Silvey (Reference Aitchison and Silvey1958) and Lang and Agresti (Reference Lang and Agresti1994).

It can be shown that $μ = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu = 1$$\end{document} , so Eq. 26 can be simplified to

(27)

\begin{matrix} L (m, λ) = n^{T} log (m) + λ^{T} B^{T} g (A^{T} m) . \end{matrix}

The ML estimates of the $m$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{m}$$\end{document} and $λ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{\lambda }}}$$\end{document} are obtained by means of an iterative procedure that determines a saddle point of this Lagrangian.

We take the derivative of $L (m, λ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L(\textbf{m},{{\varvec{\lambda }}})$$\end{document} with respect to $log m$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\log \textbf{m}$$\end{document} rather than $m$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{m}$$\end{document} because they yield simpler expressions. Note that $\partial L (m, λ) / \partial log m = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\partial L(\textbf{m},{{\varvec{\lambda }}})/ \partial \log \textbf{m} = 0$$\end{document} iff $\partial L (m, λ) / \partial m = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\partial L(\textbf{m},{{\varvec{\lambda }}})/ \partial \textbf{m} = 0$$\end{document} . Let $G = G (m)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{G} = \textbf{G}(\textbf{m})$$\end{document} be the Jacobian of $g (m)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{g}(\textbf{m})$$\end{document} with respect to $log m$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\log \textbf{m}$$\end{document} . Differentiating $L (m, λ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L(\textbf{m},{{\varvec{\lambda }}})$$\end{document} with respect to $log (m)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\log (\textbf{m})$$\end{document} yields

\begin{matrix} l (m, λ) = n - m + G λ . \end{matrix}

Under suitable regularity conditions, the ML estimator $\hat{m}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\textbf{m}}$$\end{document} is a vector $m$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{m}$$\end{document} for which there is a Lagrange multiplier vector $λ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{\lambda }}}$$\end{document} such that the simultaneous equations

\begin{matrix} l (m, λ) = & 0 \end{matrix}

and

\begin{matrix} g (m) = & 0 \end{matrix}

are satisfied. Then, the expected value of the derivative matrix of the vector $(l (m, λ), g (m))$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\textbf{l}(\textbf{m},{{\varvec{\lambda }}}),\textbf{g}(\textbf{m}))$$\end{document} with respect to $(log m, λ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$( \log \textbf{m},{{\varvec{\lambda }}})$$\end{document} is

\begin{matrix} V (m) = (\begin{matrix} E (\frac{\partial l (m, λ)}{\partial log m^{T}}) & E (\frac{\partial g (m)}{\partial log m^{T}}) \\ E (\frac{\partial l (m, λ)}{\partial λ^{T}}) & E (\frac{\partial g (m)}{\partial λ^{T}}) \end{matrix}) = (\begin{matrix} - D (m) & G \\ G^{T} & 0 \end{matrix}) . \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \textbf{V}(\textbf{m}) = \left( \begin{array}{cc} E\left( \frac{\displaystyle \partial \textbf{l}(\textbf{m},{{\varvec{\lambda }}})}{\displaystyle \partial \log \textbf{m}^\textrm{T} } \right) &{} E\left( \frac{\displaystyle \partial \textbf{g}(\textbf{m})}{\displaystyle \partial \log \textbf{m}^\textrm{T} } \right) \\ E\left( \frac{\displaystyle \partial \textbf{l}(\textbf{m},{{\varvec{\lambda }}})}{\displaystyle \partial {{\varvec{\lambda }}}^\textrm{T} } \right) &{} E\left( \frac{\displaystyle \partial \textbf{g}(\textbf{m})}{\displaystyle \partial {{\varvec{\lambda }}}^\textrm{T} } \right) \end{array} \right) = \left( \begin{array}{cc} -\textbf{D}(\textbf{m}) &{} \textbf{G} \\ \textbf{G}^\textrm{T} &{} 0 \end{array} \right) . \end{aligned}$$\end{document}

Let $n^{+}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{n}^+$$\end{document} be equal to the vector $n$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{n}$$\end{document} with zeroes replaced by a small positive constant (say, $10^{- 10}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^{-10}$$\end{document} ), and define the Fisher scoring starting values

\begin{matrix} (\begin{matrix} log m^{(0)} \\ λ^{(0)} \end{matrix}) = & (\begin{matrix} log n^{+} \\ 0 \end{matrix}) \end{matrix}

and, for $k = 0, 1, \dots$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k=0,1,\ldots $$\end{document} ,

\begin{matrix} (\begin{matrix} log m^{(k + 1)} \\ λ^{(k + 1)}) \end{matrix}) = & (\begin{matrix} log m^{(k)} \\ λ^{(k)}) \end{matrix}) - V {(m^{(k)})}^{- 1} \cdot (\begin{matrix} l (m^{(k)}, λ^{(k)}) \\ g (m^{(k)}) \end{matrix}) . \end{matrix}

Then, as $k \to \infty$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k\rightarrow \infty $$\end{document} , $m^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{m}^{(k)}$$\end{document} should go to $\hat{m}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\textbf{m}}$$\end{document} . Tedious but straightforward matrix algebra yields the simplified form

\begin{matrix} log m^{(k + 1)} = & log m^{(k)} + D {(m^{(k)})}^{- 1} l (m^{(k)}, λ^{(k + 1)}) \\ λ^{(k + 1)} = & - {(G^{T} D (m^{(k)}) G)}^{- 1} (G^{T} D {(m^{(k)})}^{- 1} (n - m^{(k)}) + g (m^{(k)}) . \end{matrix}

This algorithm does not always converge, and it can be helpful to introduce a step size ${step}^{(k)} \in ⟨ 0, 1]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text{ step}^{(k)}\in \langle 0,1]$$\end{document} as follows:

(28)

\begin{matrix} log m^{(k + 1)} = log m^{(k)} + {step}^{(k)} D {(m^{(k)})}^{- 1} l (m^{(k)}, λ^{(k + 1)}) \end{matrix}

Note that the update of $λ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{\lambda }}}$$\end{document} is left unchanged.

The step size should be chosen so that the new estimate $m^{(k + 1)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{m}^{(k+1)}$$\end{document} is better than the old estimate $m^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{m}^{(k)}$$\end{document} . A criterion for deciding this is obtained by defining the following quadratic form measuring the distance from convergence:

\begin{matrix} δ (m^{(k)}) = l (m^{(k)}, λ^{(k + 1)}) D {(m^{(k)})}^{- 1} l (m^{(k)}, λ^{(k + 1)}) . \end{matrix}

Convergence is reached at $m$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{m}$$\end{document} if and only if $δ (m) = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta (\textbf{m})=0$$\end{document} and therefore, if possible, the step size should be chosen so that $δ (m^{(k + 1)}) < δ (m^{(k)})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta (\textbf{m}^{(k+1)})< \delta (\textbf{m}^{(k)})$$\end{document} for all k. This is possible if the tentative solution is sufficiently close to the ML estimate. Otherwise, a recommendation which seems to work very well in practice is to jump to another region by taking a step size equal to one.

Footnotes

Letty Koopman received a research Grant from the Dutch Research Council (NWO): Research Talent Grant 406.16.554. The other authors declare no conflict of interest.

Letty Koopman is now at the University of Groningen.

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Aitchison, J., & Silvey, S. D. (1958). Maximum-likelihood estimation of parameters subject to restraints. The Annals of Mathematical Statistics, 29(3), 813–828. https://doi.org/10.1214/aoms/1177706538 CrossRef Google Scholar

Bartolucci, F., Colombi, R., & Forcina, A. (2007). An extended class of marginal link functions for modelling contingency tables by equality and inequality constraints. Statistica Sinica, 17(2), 691–711.Google Scholar

Bergsma, W. P. (1997). Marginal models for categorical data. Tilburg: Tilburg University Press. Retrieved from http://stats.lse.ac.uk/bergsma/pdf/bergsma_phdthesis.pdf Google Scholar

Bergsma, W. P., Croon, M. A., & Hagenaars, J. A. (2009). Marginal models: For dependent, clustered, and longitudinal categorical data. Springer. https://doi.org/10.1007/b12532 Google Scholar

Bergsma, W. P., Croon, M. A., & Hagenaars, J. A. (2013). Advancements in marginal modelling for categorical data. Sociological Methodology, 43(1), 1–41. https://doi.org/10.1177/0081175013488999 CrossRef Google Scholar

Bergsma, W. P., Croon, M. A., & Van der Ark, L. A. (2012). The empty-set and zero-likelihood problems in maximum empirical likelihood estimation. Electronic Journal of Statistics, 6(1), 2356–2361. https://doi.org/10.1214/12-EJS750 CrossRef Google Scholar

Bergsma, W. P., & Rudas, T. (2002). Marginal models for categorical data. The Annals of Statistics, 30(1), 140–159. https://doi.org/10.1214/aos/1015362188 CrossRef Google Scholar

Bergsma, W. P., & Van der Ark, L. A. (2023). cmm: Categorical marginal models. R package version 1.0. [Computer software] http://cran.r-project.org/web/packages/cmm/ Google Scholar

Berkson, J. (1980). Minimum chi-square, not maximum likelihood!. The Annals of Statistics, 8(3), 457–487. https://doi.org/10.1214/aos/1176345003 CrossRef Google Scholar

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinees ability. In Lord, F. M. & Novick, M. R. (Eds.), Statistical theories of mental test scores (pp. 395–480). Addison-Wesley.Google Scholar

Chen, J., Variyath, A. M., & Abraham, B. (2008). Adjusted empirical likelihood and its properties. Journal of Computational and Graphical Statistics, 17(2), 426–443. https://doi.org/10.1198/106186008X321068 CrossRef Google Scholar

Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9(2), 233–255. https://doi.org/10.1207/S15328007SEM0902_5 CrossRef Google Scholar

Colombi, R., & Forcina, A. (2001). Marginal regression models for the analysis of positive association of ordinal response variables. Biometrika, 88(4), 1007–1019. https://doi.org/10.1093/biomet/88.4.1007 CrossRef Google Scholar

Costa, P. T., & McCrae, R. R. (2008). The Revised NEO Personality Inventory (NEO-PI-R). In Boyle, G. J., Matthews, G., & Saklofske, H. (Eds.), The SAGE handbook of personality theory and assessment (Vol. 2, pp. 179–198). Sage.Google Scholar

Emerson, S. C., & Owen, A. B. (2009). Calibration of the empirical likelihood method for a vector mean. Electronic Journal of Statistics, 3(1), 1161–1192. https://doi.org/10.1214/09-EJS518 CrossRef Google Scholar

Evans, R. J., & Forcina, A. (2013). Two algorithms for fitting constrained marginal models. Computational Statistics & Data Analysis, 66(1), 1–7. https://doi.org/10.1016/j.csda.2013.02.001 CrossRef Google Scholar PubMed

Feldt, L. S. (1965). The approximate sampling distribution of Kuder–Richardson reliability coefficient twenty. Psychometrika, 30, 357–370. https://doi.org/10.1007/BF02289499 CrossRef Google Scholar PubMed

Feldt, L. S. (1969). A test of the hypothesis that Cronbach’s alpha or Kuder–Richardson coefficient twenty is the same for two tests. Psychometrika, 34, 363–373. https://doi.org/10.1007/BF02289364 CrossRef Google Scholar

Feldt, L. S. (1980). A test of the hypothesis that Cronbach’s alpha reliability coefficient is the same for two tests administered to the same sample. Psychometrika, 45, 99–105. https://doi.org/10.1007/BF02293600 CrossRef Google Scholar

Grendár, M., & Judge, G. (2009). Empirical set problem of maximum empirical likelihood methods. Electronic Journal of Statistics, 3(1), 1542–1555. https://doi.org/10.1214/09-EJS528 CrossRef Google Scholar

Grizzle, J. E., Starmer, C. F., & Koch, G. G. (1969). Analysis of categorical data by linear models. Biometrics, 25(3), 489–504. https://doi.org/10.2307/2528901 CrossRef Google Scholar PubMed

Jorgensen, T. D., Kite, B. A., & Chen, P.-Y. (2017). Finally! A valid test of configural invariance using permutation in multigroup CFA. In van der Ark, L. A., Wiberg, M., Culpepper, S. A., Douglas, J. A., & Wang, W.-C. (Eds.), Quantitative psychology. The 81st Annual Meeting of the Psychometric Society , Asheville, North Carolina, 2016., Springer. https://doi.org/10.1007/978-3-319-56294-0_9 CrossRef Google Scholar

Kuijpers, R. E., Van der Ark, L. A., & Croon, M. A. (2013). Testing hypotheses involving Cronbach’s alpha using marginal models. British Journal of Mathematical and Statistical Psychology, 66(3), 503–520. https://doi.org/10.1111/bmsp.12010 CrossRef Google Scholar PubMed

Lang, J. B. (1996). Maximum likelihood methods for a generalized class of log-linear models. The Annals of Statistics, 24(2), 726–752. https://doi.org/10.1214/aos/1032894462 CrossRef Google Scholar

Lang, J. B. (2005). Homogeneous linear predictor models for contingency tables. Journal of the American Statistical Association, 100(469), 121–134. https://doi.org/10.1198/016214504000001042 CrossRef Google Scholar

Lang, J. B., & Agresti, A. (1994). Simultaneously modeling the joint and marginal distributions of multivariate categorical responses. Journal of the American Statistical Association, 89(426), 625–632. https://doi.org/10.1080/01621459.1994.10476787 CrossRef Google Scholar

Molenberghs, G., & Lesaffre, E. (1999). Marginal modelling of multivariate categorical data. Statistics in Medicine, 18(17–18), 2237–2255. https://doi.org/10.1002/(SICI)1097-0258(19990915/30)18:17/1<2237::AID-SIM252>3.0.CO;2-R 3.0.CO;2-R>CrossRef Google Scholar PubMed

Lloyd, S. (2000). Ultimate physical limits to computation. Nature, 406(1), 1047–1054. https://doi.org/10.1038/35023282 CrossRef Google Scholar PubMed

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Addison-Wesley.Google Scholar

Maydeu-Olivares, A., Coffman, D. L., García-Forero, C., & Gallardo-Pujol, D. (2010). Hypothesis testing for coefficient alpha: An SEM approach. Behavior Research Methods, 42, 618–625. https://doi.org/10.3758/BRM.42.2.618 CrossRef Google Scholar PubMed

Maydeu-Olivares, A., Coffman, D. L., & Hartmann, W. M. (2007). Asymptotically distribution-free (ADF) interval estimation of coefficient alpha. Psychological Methods, 12, 157–176. https://doi.org/10.1037/1082-989X.12.2.157 CrossRef Google Scholar PubMed

Mokken, R. J. (1971). A theory and procedure of scale analysis. De Gruyter.CrossRef Google Scholar

Nunnally, J. C. (1978). Psychometric theory. McGraw-Hill.Google Scholar

Nguyen, M. K., Phelps, S., & Ng, W. L. (2015). Simulation based calibration using extended balanced augmented empirical likelihood. Statistics and Computing, 25(6), 1093–1112. https://doi.org/10.1007/s11222-014-9506-9 CrossRef Google Scholar

Owen, A. B. (2001). Empirical likelihood. Chapman & Hall/CRC. https://doi.org/10.1201/9781420036152 Google Scholar

Qaqish, B. F., & Liang, K. Y. (1992). Marginal models for correlated binary responses with multiple classes and multiple levels of nesting. Biometrics, 48(3), 939–950. https://doi.org/10.2307/2532359 CrossRef Google Scholar PubMed

Qin, J., & Lawless, J. (1994). Empirical likelihood and general estimating equations. The Annals of Statistics, 22(1), 300–325. https://doi.org/10.1214/aos/1176325370 CrossRef Google Scholar

Raven, J., Raven, J. C., & Court, J. H. (2003). Manual for Raven’s Progressive Matrices and Vocabulary Scales. Section 1: General Overview. New York: Harcourt Assessment.Google Scholar

Rudas, T., & Bergsma, W. P. (2023). Marginal models: An overview. In Kateri, M., & Moustaki, I. (Eds.), Trends and challenges in categorical data analysis: Statistical modelling and interpretation. Berlin: Springer. https://doi.org/10.1007/978-3-031-31186-4_3 Google Scholar

Sijtsma, K., & Molenaar, I. W. (2002). Introduction to nonparametric item response theory. Thousand Oaks: Sage. https://doi.org/10.4135/9781412984676 CrossRef Google Scholar

Sijtsma, K., & Van der Ark, L. A. (2017). A tutorial on how to do a Mokken scale analysis on your test and questionnaire data. British Journal of Mathematical and Statistical Psychology, 70(3), 137–158. https://doi.org/10.1111/bmsp.12078 CrossRef Google Scholar

Van der Ark, L. A., Croon, M. A., & Sijtsma, K. (2008). Mokken scale analysis for dichotomous items using marginal models. Psychometrika, 73, 183–208. https://doi.org/10.1007/s11336-007-9034-z CrossRef Google Scholar PubMed

Van Zyl, J. M., Neudecker, H., & Nel, D. G. (2000). On the distribution of the maximum likelihood estimator of Cronbach’s alpha. Psychometrika, 65, 271–280. https://doi.org/10.1007/BF02296146 CrossRef Google Scholar

Varin, C., Reid, N., & Firth, D. (2011). An overview of composite likelihood methods. Statistica Sinica, 12(1), 5–42.Google Scholar

Xia, X., & Liu, Z. (2019). Balanced augmented empirical likelihood for regression models. Journal of the Korean Statistical Society, 48(2), 233–247. https://doi.org/10.1016/j.jkss.2018.10.006 CrossRef Google Scholar

Bergsma, W. P. (1997). Marginal models for categorical data. Tilburg: Tilburg University Press. Retrieved from http://stats.lse.ac.uk/bergsma/pdf/bergsma_phdthesis.pdf Google Scholar

Bergsma, W. P., Croon, M. A., & Hagenaars, J. A. (2009). Marginal models: For dependent, clustered, and longitudinal categorical data. Springer. https://doi.org/10.1007/b12532 Google Scholar

Bergsma, W. P., & Rudas, T. (2002). Marginal models for categorical data. The Annals of Statistics, 30(1), 140–159. https://doi.org/10.1214/aos/1015362188 CrossRef Google Scholar

Bergsma, W. P., & Van der Ark, L. A. (2023). cmm: Categorical marginal models. R package version 1.0. [Computer software] http://cran.r-project.org/web/packages/cmm/ Google Scholar

Berkson, J. (1980). Minimum chi-square, not maximum likelihood!. The Annals of Statistics, 8(3), 457–487. https://doi.org/10.1214/aos/1176345003 CrossRef Google Scholar

Grizzle, J. E., Starmer, C. F., & Koch, G. G. (1969). Analysis of categorical data by linear models. Biometrics, 25(3), 489–504. https://doi.org/10.2307/2528901 CrossRef Google Scholar PubMed

Lang, J. B. (1996). Maximum likelihood methods for a generalized class of log-linear models. The Annals of Statistics, 24(2), 726–752. https://doi.org/10.1214/aos/1032894462 CrossRef Google Scholar

Lloyd, S. (2000). Ultimate physical limits to computation. Nature, 406(1), 1047–1054. https://doi.org/10.1038/35023282 CrossRef Google Scholar PubMed

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Addison-Wesley.Google Scholar

Mokken, R. J. (1971). A theory and procedure of scale analysis. De Gruyter.CrossRef Google Scholar

Nunnally, J. C. (1978). Psychometric theory. McGraw-Hill.Google Scholar

Owen, A. B. (2001). Empirical likelihood. Chapman & Hall/CRC. https://doi.org/10.1201/9781420036152 Google Scholar

Qin, J., & Lawless, J. (1994). Empirical likelihood and general estimating equations. The Annals of Statistics, 22(1), 300–325. https://doi.org/10.1214/aos/1176325370 CrossRef Google Scholar

Raven, J., Raven, J. C., & Court, J. H. (2003). Manual for Raven’s Progressive Matrices and Vocabulary Scales. Section 1: General Overview. New York: Harcourt Assessment.Google Scholar

Sijtsma, K., & Molenaar, I. W. (2002). Introduction to nonparametric item response theory. Thousand Oaks: Sage. https://doi.org/10.4135/9781412984676 CrossRef Google Scholar

Varin, C., Reid, N., & Firth, D. (2011). An overview of composite likelihood methods. Statistica Sinica, 12(1), 5–42.Google Scholar

Table 1. Convergence rates (percentage) and median computation times in seconds for ML, MEL, and MAEL, for three different CMMs, two numbers of items (J), and three percentages of unobservable response patterns (U) based on 1,000\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$1,\!000$$\end{document} (J=4,8\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$J = 4, 8$$\end{document}) and 100 (J=10\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${ J} = 10$$\end{document}) replications.

Table 2. Type I error rate for MAEL estimation of three different CMMs, four different numbers of items (J), and three different sample sizes (N), based on 1000 replications.

Table 3. Estimated standard error of CMM-parameter estimate β^\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\hat{\beta }$$\end{document} for Model “Mean”, for four different numbers of items (J), and three different sample sizes(N), based on 1000 (J=20\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$J= 20$$\end{document} and J=40\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$J=40$$\end{document}) and 10, 000 (J=4\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$J= 4$$\end{document} and J=8\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$J=8$$\end{document}) replications.

Article contents

Maximum Augmented Empirical Likelihood Estimation of Categorical Marginal Models for Large Sparse Contingency Tables

Abstract

Keywords

1. CMMs

Example 1

Example 2.

2. Estimation of CMMs

2.1. ML and MEL Estimation

Example 3

2.2. The First- and Second-Order Estimation Problems for CMMs

Example 4

Example 5

2.3. MAEL Estimation

Example 6

3. Comparing ML, MEL, and MAEL

3.1. Population Models and Estimation

3.2. Study 1: convergence Rates and Computation Times

3.3. Study 2: Type I Error Rate

4. Discussion

Appendix A First- and Second-Order Estimation Problems

Example 7

B Algorithm for Maximum Likelihood Estimation

Footnotes

References

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests