There are a wide range of psychometric models for analyzing data in educational and psychological surveys. Models including discrete and continuous latent factors have received great attention due to repeated empirical evidence of adequate model fit and success of interpretation that aligns with substantive theory. Among them, a variety of multidimensional item response theory (MIRT) models (Reckase, 2009) have been proposed to account for various multidimensional structures of the latent constructs. Within the family of MIRT models, one of the most studied models is the multidimensional two-parameter logistic model (M2PL) (Reckase, 2009) for dichotomous response, as well as the multidimensional three-parameter logistic model (M3PL) and Multidimensional Four-parameter Logistic Model (M4PL), which are often used when students can guess the answer to the test item correctly (resulting in a lower asymptote in educational measurement) or when the chance of answering an item correctly does not approach 1 (resulting in a higher asymptote in psychopathology measurement). Moreover, the multidimensional graded response model (Cai, Reference Cai2010) and multidimensional (generalized) partial credit model (Yao & Schwarz, Reference Yao and Schwarz2006) have also been proposed to handle polytomous items. These models are regarded as extensions of IRT models for dichotomous response in various ways to characterize latent cognitive structures from more complicated datasets.
In the IRT literature, the marginal maximum likelihood estimator (MMLE) is considered to be a consistent and efficient approach for parameter estimation by maximizing the marginal log-likelihood of item parameters (Bock & Aitkin, Reference Bock and Aitkin1981; Bock et al., Reference Bock, Gibbons and Muraki1988). In particular, this estimator is efficient in the asymptotic regime where the test length is small with a rather large number of examinees, which is often the case in applications. However, integrating out the latent ability for marginal likelihood evaluation is notoriously time-consuming, involving multidimensional integrals whose computational complexity grows exponentially with the number of latent dimensions. This results in great difficulties especially when the latent structure is complex or within a domain of large dimension. Several methods have been proposed previously to deal with this computational challenge, such as Gaussian quadrature methods (Bock & Aitkin, Reference Bock and Aitkin1981; Schilling & Bock, Reference Schilling and Bock2005), Laplace approximation methods (Lindstrom & Bates, Reference Lindstrom and Bates1988; Wolfinger & O’connell, Reference Wolfinger and O’connell1993; Andersson & Xin, Reference Andersson and Xin2021), Metropolis-Hastings Robbins-Monro algorithms (Cai, Reference Cai2010), stochastic expectation maximization (EM) algorithms (von Davier & Sinharay, Reference von Davier and Sinharay2010; Zhang et al., Reference Zhang, Chen and Liu2020), and variational approximation methods (Rijmen & Jeon, Reference Rijmen and Jeon2013; Cho et al., Reference Cho, Wang, Zhang and Xu2021; 2022). Among these, Gauss-Hermite quadrature approximation does not scale well to high-dimensional scenarios, and Laplace approximation, which is closely related to variational method (Opper & Archambeau, Reference Opper and Archambeau2009), may suffer from numerical inaccuracies when dimensions get high or when the likelihood function is in a skewed shape. Additionally, many variants of Laplace approximations, though overcoming some deficiencies, suffer from inflexibility or may be hard to implement (Ormerod & Wand, Reference Ormerod and Wand2010). The methods based on Monte Carlo simulations such as Metropolis-Hastings Robbins-Monro and stochastic EM algorithms, on the other hand, are more robust but may be computationally inefficient. Notably, in addition to being numerically accurate and computationally efficient, the variational method also provides good interpretability and the variational distributions contain additional useful information (Blei et al., Reference Blei, Kucukelbir and McAuliffe2017).
Despite the extensive literature on estimation methods for models of dichotomous response (Cai, Reference Cai2010; Chen et al., Reference Chen, Li and Zhang2019; Cho et al., Reference Cho, Wang, Zhang and Xu2021; Feuerstahler & Waller, Reference Feuerstahler and Waller2014; Meng et al., Reference Meng, Xu, Zhang and Tao2020), little attention has been given to the efficient and robust estimation of MIRT models for polytomous responses (Bock et al., Reference Bock, Gibbons and Muraki1988; Kim & Wilson, Reference Kim and Wilson2020). In the paper, we propose a Gaussian variational expectation-maximization algorithm for the multidimensional generalized partial credit model (MGPCM), which is both computationally efficient and numerically stable. The variational method, first proposed in computer science and statistical learning, has since become an efficient approach for large-scale computation in fields such as pattern recognition and document retrieval (Titterington, Reference Titterington2004; Blei & Jordan, Reference Blei and Jordan2006). The application of the variational method in analyzing statistical models has also received wide attention (Blei et al., Reference Blei, Kucukelbir and McAuliffe2017; Ormerod & Wand, Reference Ormerod and Wand2010). For instance, Ormerod and Wand (Reference Ormerod and Wand2012) adopted a Gaussian variational approximation approach for the estimation of generalized linear mixed effects models. In the field of psychometrics and educational measurement, variational methods have been used for efficient estimation of multidimensional 1PL models (Rijmen & Jeon, Reference Rijmen and Jeon2013) and multidimensional 2/3PL models (Cho et al., Reference Cho, Wang, Zhang and Xu2021), as well as analysis in cognitive diagnostic models (Yamaguchi & Okada, Reference Yamaguchi and Okada2020; Oka & Okada Reference Oka and Okada2023). Motivated by Cho et al. (Reference Cho, Wang, Zhang and Xu2021), in this paper, we generalize their method to the estimation of MGPCM. This generalization is nontrivial because the MGPCM uses a generalized logit link function that requires a completely new derivation of the variational lower bound to fully enable closed-form parameter update in the EM algorithm.
The rest of the paper is organized as follows. Section 1 provides a brief introduction to the model. In Sect. 2, we introduce the polytomous Gaussian variational expectation-maximization algorithm (pGVEM) and give its derivation. Section 3 evaluates the performance of the pGVEM algorithm compared to the traditional EM algorithm and other existing methods through a comprehensive simulation study. In Sect. 4, we apply the pGVEM algorithm to analyze data from an international educational assessment database and a Big-Five personality assessment. Finally, in Sect. 5, we conclude the paper and suggest potential future research directions.
1. Model Setting
Generalized partial credit model (GPCM) (Muraki, Reference Muraki1992; Embretson & Reise, 2013), also named as a compensatory multidimensional two-parameter partial credit model (M-2PPC) (Yao & Schwarz, Reference Yao and Schwarz2006), is a popular IRT model for polytomous responses. It allows for the assessment of partial scores for constructed response items and intermediate steps that students have accomplished on the path toward solving the items completely. Suppose we have N examinees and J test items, with the random variable \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{ij}$$\end{document} denoting the partial credit of person i’s response to item j. For each item j, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_j$$\end{document} is a discrimination parameter that implies the strength of association between latent trait and item responses. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{jk}$$\end{document} is a threshold parameter that separates two adjacent response categories. The partial credit model (Masters, Reference Masters1982) is a special case of GPCM where the discrimination parameter is fixed to be the same across different items. The item response function of GPCM that characterizes the probability of a specific response is given by
where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k=0,1,\ldots ,K_j-1$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K_j$$\end{document} is the number of differential partial credit scores for the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j^{th}$$\end{document} item.
The multidimensional generalized partial credit model (MGPCM) is a natural multidimensional extension of GPCM. The main idea is replacing the uni-dimensional latent ability with a D-dimensional vector, and each dimension represents a facet of the multidimensional construct (i.e., science and literacy). Similarly, the discrimination parameters \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}_j$$\end{document} also become D-dimensional vectors to reflect the discrimination power of item j with respect to each facet (i.e., dimension) of the multidimensional construct \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }$$\end{document} . The threshold parameters stay the same as in uni-dimensional models. Equation (1.1) therefore is updated as follows:
Here \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}_j^\prime {\varvec{\theta }}_i$$\end{document} indicates the inner product of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}_j$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_i$$\end{document} as \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}_j^\prime {\varvec{\theta }}_i=\sum _{d=1}^Da_{jd}\theta _{id}$$\end{document} where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{jd}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _{id}$$\end{document} are the dth component of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}_j$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_i$$\end{document} , respectively. With a slight re-parameterization, we have the following item response function for MGPCM which we will use throughout the paper:
In Equation (1.3), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{jk}$$\end{document} replaces \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum _{r=0}^k\beta _{jr}$$\end{document} in Equation (1.2) for each \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k=0,1,\ldots ,K_j-1$$\end{document} . Note that for model identification, we can only have \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K_j-1$$\end{document} estimable threshold parameters, and hence we fix \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{j0}=0$$\end{document} .
2. Gaussian Variational Approximation
2.1. Derivation of Algorithm
In this section, we describe the derivation of the GVEM algorithm. In the following, we denote the collection of item parameters by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_p$$\end{document} being the total number of parameters to be estimated, i.e., \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_p=\{{\varvec{a}}_j\in {\mathbb {R}}^D,b_{jk}\in {\mathbb {R}}:j=1,\ldots ,J,k=1,\ldots ,K_j-1\}$$\end{document} for MGPCM. As we discussed above, the parameter \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{j0}$$\end{document} is fixed as 0 for all j. To be consistent with the common convention, we assume the latent vector \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }$$\end{document} follows a multivariate normal distribution of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{0}}$$\end{document} mean and covariance \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Sigma }}_{\theta }$$\end{document} with density function denoted by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi (\cdot )$$\end{document} . The marginal probability of response vector \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{Y}}_i=(Y_{i1},\ldots ,Y_{iJ})^\prime $$\end{document} for person i is defined as follows
where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I_{(Y_{ij}=k)}$$\end{document} is an indicator function equal to 1 if \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{ij}=k$$\end{document} and zero otherwise, and therefore the marginal log-likelihood function for all responses from the examinees is given by
where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{Y}}=({\varvec{Y}}_1,\ldots ,{\varvec{Y}}_{N})^\prime $$\end{document} is the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N\times J$$\end{document} matrix of realized categorical responses.
Following the variational estimation literature (Blei et al., Reference Blei, Kucukelbir and McAuliffe2017; Cho et al., Reference Cho, Wang, Zhang and Xu2021), we first derive a variational lower bound for MGPCM. We define \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$KL\{p(\cdot )\Vert q(\cdot )\}$$\end{document} as the Kullback-Leibler divergence of probability distribution p and q. For an arbitrary probability density function \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q(\cdot )$$\end{document} , the marginal log-likelihood in Equation (2.1) has the following lower bound (Blei et al., Reference Blei, Kucukelbir and McAuliffe2017):
The last inequality holds if and only if the KL divergence between the variational distribution \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_i(\cdot )$$\end{document} and the posterior distribution \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P(\cdot |{\varvec{Y}}_i,M_p)$$\end{document} is 0, which indicates \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_i({\varvec{\theta }}_i)=P({\varvec{\theta }}_i|{\varvec{Y}}_i,M_p)$$\end{document} . In the literature of variational inference, the right-hand side of Equation (2.2) is defined to be the evidence lower bound (Blei et al., Reference Blei, Kucukelbir and McAuliffe2017), which is equivalent, up to a constant with respect to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q(\cdot )$$\end{document} , to the KL divergence between the assumed variational distribution \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q(\cdot )$$\end{document} and the conditional density of the latent variables given the observations.
In the following, we construct an approximation for the marginal maximum likelihood estimator from the evidence lower bound. The primary objective is to identify a suitable distribution that can approximate the posterior distribution \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P({\varvec{\theta }}_i|{\varvec{Y}}_i,M_p)$$\end{document} . Motivated by this insight, we propose to construct an EM-type algorithm to compute the marginal maximum likelihood estimator. In the E-step, we evaluate the expectation of the complete data log-likelihood, and the expectation is taken with respect to the latent variables \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_i$$\end{document} under its variational probability density function \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_i(\cdot )$$\end{document} :
Here the density function \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_i({\varvec{\theta }}_i)$$\end{document} is chosen to minimize the KL divergence \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$KL\{q_i({\varvec{\theta }}_i\Vert P({\varvec{\theta }}_i|{\varvec{Y}}_i,$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_p)\}$$\end{document} as the best approximation to the posterior distribution. The second term in the evidence lower bound is left out since it is irrelevant to item parameters. However, a problem with respect to minimizing the KL divergence is that it is hard to find an explicit formula for the posterior distribution of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_i$$\end{document} with respect to the previous estimated item parameters \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{M}}_p$$\end{document} , as it involves computing D-dimensional integrals. Numerical methods, such as the Gauss-Hermite approximation, Monte Carlo expectation-maximization, and stochastic expectation-maximization, are often used to provide fast approximation. Herein we adopt the Gaussian variational inference method. It is widely accepted that the posterior distribution of the latent ability \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P({\varvec{\theta }}_i|Y_i,M_p)$$\end{document} can be approximated by a Gaussian distribution (Chang & Stout, Reference Chang and Stout1993; Wang Reference Wang2015), and hence we aim to find an optimal \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_i({\varvec{\theta }}_i)$$\end{document} in the family of Gaussian distribution while minimizing the KL divergence between \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_i({\varvec{\theta }}_i)$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P({\varvec{\theta }}_i|{\varvec{Y}}_i,M_p)$$\end{document} . Since the posterior distribution can be expressed as
we only need to evaluate \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P({\varvec{Y}}_i,{\varvec{\theta }}_i|M_p)$$\end{document} to find a proper \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_i(\cdot )$$\end{document} as
Under the setting of MGPCM, the logarithm of joint distribution function of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_i$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{Y}}_i$$\end{document} is
The nonlinear softmax function, defined by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_v({\varvec{x}})=\exp (x_v)/[\sum _{k=1}^n\exp (x_k)]$$\end{document} for an n-dimensional vector \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{x}}$$\end{document} , is the main cause of the intractability of integral. To overcome this problem, a variational lower bound based on the approximation to the softmax function is proposed and by augmenting Equation (2.3) with variational parameters, the evidence lower bound can be computed explicitly without resorting to numeric integration. Among the many approximations for the softmax function, we adopt a One-Versus-Each bound (Tisais, Reference Tisais2016), which well approximates the softmax function and captures the model features. We start with the following inequality:
Denote by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_{ij}$$\end{document} the realized partial credit score for the response of the ith person for the jth item. Then by applying bound (2.4) to (2.3), we have
Here we denote \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{ijk}=k{\varvec{a}}_j^\prime {\varvec{\theta }}_i-b_{jk}$$\end{document} for short. We wish to draw attention to our selection of the “One-Versus-Each bound.” It can be established for (2.4) that a strict inequivalence holds true in all cases, except for the exceptional circumstance \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${x_v}/{x_k}\rightarrow \infty $$\end{document} for all \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k\ne v$$\end{document} with at most one exception. Additionally, the approximation is closest when \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_v$$\end{document} is among the largest of all \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_k$$\end{document} . The idea of maximum likelihood estimation indicates that, when partial credit score \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{ij}$$\end{document} is recorded as \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_{ij}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_{ij}{\varvec{a}}_j^\prime {\varvec{\theta }}_i-b_{jk_{ij}}$$\end{document} is the most likely to be the largest among all \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$v{\varvec{a}}_j^\prime {\varvec{\theta }}_i-b_{jv}$$\end{document} for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$v=0,1,\ldots ,K_j-1$$\end{document} . Therefore the feature of the One-Versus-Each bound does fit well as an approximation to the marginal maximum likelihood estimator.
Logistic sigmoid function (2.4) can be further approximated by a local variational approach:
where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\xi }}=\{\xi _{ijk}\}_{i,j,k}$$\end{document} are called variational parameters, which will be iteratively updated together with item parameters in the M-step. Here the function \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta (x)$$\end{document} is defined as \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${(e^x-1)}/{[4x(e^x+1)]}$$\end{document} (Jaakkola & Jordan, Reference Jaakkola and Jordan2000). We use this local variational approximation for a suitable expectation of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\log P({\varvec{Y}}_i,{\varvec{\theta }}_i|M_p)$$\end{document} (i.e., given below in Equation (3.5)) that can be written as a quadratic form with respect to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_i$$\end{document} . This will facilitate the selection of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_i(\cdot )$$\end{document} in the family of Gaussian distributions.
Next we substitute \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{ijk}$$\end{document} by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k{\varvec{a}}_j^\prime {\varvec{\theta }}_i-b_{jk}$$\end{document} and write the above lower bound of joint distribution function as
and therefore the expectation of the log-likelihood, which needs computing in the E-step, takes the following form:
For a minimized KL divergence, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_i(\theta _i)$$\end{document} is selected as follows:
As the choice of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_i(\cdot )$$\end{document} has been confined in the Gaussian family, it suffices to give the update for the mean and covariance matrix:
In each iteration, the item parameters and variational parameters \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi _{ijk},{\varvec{a}}_j,b_{jk}$$\end{document} are obtained from the previous M-step and taken as the initial value if it is the first iteration.
For the M-step, the item parameters are chosen to maximize the above expectation of the lower bound obtained by plugging Equation (2.6) and (2.7) into Equation (2.5):
Through maximizing the lower bound \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\underline{E}(M_p,{\varvec{\xi }})$$\end{document} , we can derive a new set of item parameters that could potentially maximize the left-hand side. Updating the variational parameters helps to prevent the iteration from leading to a smaller value of the target expectation by shrinking the inequality too much when the right-hand side is maximized. The efficiency of this majorization-maximization approach depends on the goodness of fit of the adopted softmax bound.
To maximize the lower bound on the expectation concerning the item parameters \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_p$$\end{document} and variational parameters \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\xi }}$$\end{document} , we employ a Gauss-Seidel scheme to handle the nonlinear terms regarding the parameters. Each iterative update uses the most recently updated copies of the parameters. The update is given as follows.
For each \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,\ldots , J$$\end{document} ,
with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi _{ijk}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{jk}$$\end{document} from the last iteration or initialization and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\mu }}_i$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Sigma }}_i$$\end{document} from E-step. Next for the threshold parameters, for each \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,\ldots ,J,k=1,\cdots K_j-1$$\end{document} ,
where
Here \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}_j$$\end{document} are from the previous step and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\mu }}_i$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Sigma }}_i$$\end{document} are from the previous E-step. Finally for the variational parameters \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi _{ijk}$$\end{document} , for each \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i=1,\ldots ,I,j=1,\ldots , J,k=0,\cdots K_j-1$$\end{document} , we have update as
with all other parameters obtained from the latest updates.
In the exploratory analysis where we do not have any prior information on the item factor loadings, so the assumed covariance \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Sigma _\theta $$\end{document} is fixed as \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I_D$$\end{document} and later proper rotations (Browne, Reference Browne2001) are imposed to allow the factors to be correlated and thus allow for analysis of latent structures. But for confirmatory factor analysis, we update the covariance as
and scale its diagonal entries to be 1.
2.2. Standard Error Estimation
Computing standard errors (SEs) of the item parameter estimates is crucial for various applications, such as multidimensional computerized adaptive testing, item parameter calibration as well as differential item functioning. Challenges arise in estimating SEs when dealing with a high-dimensional latent domain and polytomous responses as in MGPCM. The commonly used method for estimating SEs is based on the approximated Fisher’s information matrix. However, taking the inverse of a prohibitively large information matrix (due to high dimensions and long test length) can be unstable when the sample size of the examinees is not large enough. An alternative numerical approximation using Gaussian quadrature in EM estimation has been proposed (Cagnone & Monari, Reference Cagnone and Monari2013), but it is computationally expensive and sensitive to dimensionality. The supplemented expectation-maximization (SEM) algorithm has also been developed in the IRT literature (e.g., Tian et al., Reference Tian, Cai, Thissen and Xin2013). However, in pilot simulations we found that none of these methods are capable of providing stable estimations, especially when the dimension D and the number of categories K are large. Therefore, to estimate the standard errors of item parameters under the pGVEM framework for MGPCM, we adopt a bootstrap approach that uses a resampling procedure. Bootstrap is an efficient alternative when the standard SEs estimation is mathematically intractable (Efron & Tibshirani, 1986). The resampling procedure avoids the direct computation of SEs.
The bootstrap procedure in the pGVEM framework is implemented as follows. First we simulated B bootstrap datasets based on \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{M}}_p= \{\hat{{\varvec{a}}}_{j},\hat{{\varvec{b}}}_{j}\}_j$$\end{document} estimated from the pGVEM scheme. Then we apply the pGVEM method to estimate the item parameters for each of the bootstrap datasets, denoted by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{M}}_p^{(1)},\ldots ,{\hat{M}}_p^{(B)}$$\end{document} . The standard errors are estimated by
where v denotes item parameter \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{jr}$$\end{document} or \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{jk}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{v}}^{(i)}$$\end{document} is its ith bootstrap estimate. Given that our objective is to estimate SEs rather than the distributions of the estimators, in our study, we take the number of bootstrap samples to be 50, which generates stable results numerically.
2.3. Determining Latent Dimension
In this section, we discuss how to select the appropriate number of latent dimensions. We propose to use the information criterion such as AIC or BIC to compare the model fit with different dimensions. In the MGPCM, direct computation of the residual sum of squares is costly. So we adopt the modified version of the information criterion where the expectation is replaced by its lower bound given by (2.9):
where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{{\varvec{A}}}=(\hat{{\varvec{a}}}_1,\ldots ,\hat{{\varvec{a}}}_J)$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{{\varvec{B}}}=(\hat{{\varvec{b}}}_1,\ldots ,\hat{{\varvec{b}}}_J)$$\end{document} are assembled matrices of the discrimination and threshold parameters, respectively, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text{ nondiag }(\hat{{\varvec{\Sigma }}}_\theta )$$\end{document} denotes the nondiagonal entries of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{{\varvec{\Sigma }}}_\theta $$\end{document} , and the zero norm \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$||\cdot ||_0$$\end{document} counts the number of nonzero entries of the assembled matrix. Here note that the term \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Vert \hat{{\varvec{B}}}\Vert _0$$\end{document} does not increase with dimension D and it denotes the number of all effective b parameters. In addition, since the covariance matrix \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{{\varvec{\Sigma }}}_{\theta }$$\end{document} is symmetric with unit diagonal entries, we count the effective number of parameters in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{{\varvec{\Sigma }}}_{\theta }$$\end{document} as \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Vert \text{ nondiag }(\hat{{\varvec{\Sigma }}}_\theta )\Vert _0/2$$\end{document} . The major advantage of the proposed criteria is that the lower bound of expectation is readily obtained with the updated item parameters and variation parameters, with no extra computation cost.
3. Simulation Studies
3.1. Study I
We conducted simulation studies to compare the empirical performance of the proposed pGVEM algorithm with the EM algorithm with fixed quadrature, Metropolis-Hastings Robbins-Monro (MHRM) algorithm (Cai, Reference Cai2010), and the stochastic EM (StEM) for MGPCM, which are implemented in the R package ‘mirt’ (Chalmers, Reference Chalmers2012), in terms of mean squared error and bias of the estimation together with their computational time. The simulations were conducted in the exploratory factor analysis (EFA) scenario, where no constraints on the item factor loading structure were imposed during the analysis. EFA is generally more computationally challenging than confirmatory factor analysis. In our study, we assumed that the latent covariance matrix was \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I_D$$\end{document} to remove scale and rotational indeterminacy, and no further assumptions on the structure of the loading matrix A were made during the analysis.
After estimating the parameters by our pGVEM algorithm, we performed proper oblique rotation to allow the factors to be correlated. Many methods, including varimax, direct oblimin, quartimax, equamax, and promax (Browne, Reference Browne2001; Hendrickson & White, Reference Hendrickson and White1964), are available for factor rotation in the literature. In our simulation study, we applied promax rotation as it is one of the most computationally efficient oblique rotation methods in large-scale factor analysis. For the estimation implemented in the ‘mirt’ package, we use the built-in promax rotation to obtain the estimation, and for the pGVEM estimation, we use the function promax in R, with default \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m=4$$\end{document} , to perform the rotation after the iteration ends.
The manipulated conditions include: (1) sample size \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=200,500$$\end{document} ; (2) test length \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J=10,20$$\end{document} ; (3) number of categories \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=3,6$$\end{document} ; (5) low and high correlation among the latent traits; (6) small- and large-scale loadings. For each condition, a total number of 100 replicated cases were simulated. In the context of partial credit models, it is noted that the scaling of loadings plays a pivotal role in shaping the likelihood function. Specifically, when loadings are high and there exist multiple categories for partial credit scoring, the following case usually occurs: The probability of attaining the highest or lowest scores becomes disproportionately large. Consequently, this dominance of extreme scores may result in insufficient records of intermediate scores, thereby making the estimation of threshold parameters problematic. Therefore in the simulation studies, we considered two cases: (1) low scale loading: Parameter \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{jr}$$\end{document} was simulated from Unif(0.5, 1) for all \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,\ldots ,J,\;r=1,\ldots , D$$\end{document} ; (2) high scale loading: Parameter \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{jr}$$\end{document} was simulated from Unif(1, 2) for all \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,\ldots ,J,\;r=1,\ldots , D$$\end{document} . The threshold parameters \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{jk}$$\end{document} are simulated from N(0, 1) for all \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,\ldots , J$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k=1,\ldots , K-1$$\end{document} . For the latent variables, they were simulated from a multivariate normal distribution with 0 mean and covariance matrix \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Sigma }}_\theta $$\end{document} . The diagonal entries were fixed as 1, and off-diagonal entries were generated from a uniform distribution Unif(0.1, 0.3) in the low correlation case and Unif(0.5, 0.7) in the high correlation case. For the responses generated from the simulated model parameters, we perform simulation only on cases where for all \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,\ldots ,J$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k=0,\cdots K-1$$\end{document} ,
Here # denotes the set counting operator. Skipping any item with all responses being 0 is necessary since the threshold parameter linked to each category of this item cannot be identified within the finite sample context. For the convergence criterion, the algorithm was terminated when the change of all item parameters between two iterations dropped below a pre-specified threshold, i.e.,
The estimation errors are presented separately in the form of mean squared error and bias for the discrimination and threshold parameters and the covariance matrix, averaged across the test items:
Note that here both the true and estimated discrimination parameters have been rotated by the promax rotations. The number of Markov chain samples drawn in the MHRM algorithm was by default 5, 000 in the R package “mirt.” The convergence criterion and optimizer were all set as default in the ‘mirt’ package.
Under the setting of small-scale loadings (i.e., \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{jr}\sim $$\end{document} Unif(0.5, 1)), Figs. 1 and 2 show the MSE of the four estimation methods for 3-category and 6-category cases, respectively. Each box represents the distribution of errors from 100 replications. We truncated the scale of the y-axis of the plot to make it easier to compare the estimation precision across different scenarios. We present the full version of the boxplots in Appendix B. Overall, our method provides more stable and accurate estimates of the MGPCM, as seen from both figures. The observed results indicate a reduced variability in estimation errors for the model parameters when comparing pGVEM to the other methods. Both MHRM and StEM exhibit better stability and accuracy compared to the standard EM algorithm. Notably, pGVEM demonstrates good stability, particularly evident when the sample size N is 200. Additionally, it is noteworthy that as the number of categories increases, the estimation of the threshold parameters becomes more challenging, an anticipated result given that the model becomes more complicated for multicategory cases. Interestingly, despite a decrease in accuracy, the proposed pGVEM method exhibits a comparatively modest increase in variability in many cases compared with alternative methods, indicating the capability of the pGVEM method to handle more complex scenarios. Furthermore, we present the bias of the estimation using the four different methods in Figs. 3 and 4 for the considered cases. In general, the bias observed in pGVEM estimation tends to be more moderate across various cases, particularly with regard to the threshold parameter.
Under the setting of large-scale loadings (i.e., \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{jr}\sim $$\end{document} Unif(1,2)), the MSE and bias results are presented in Figs. 5, 6, 7, and 8. We can see from the simulation results that in this setting the error of estimation gets larger in most cases. The reason is that high level of loading increases the frequency of extreme scores, thus making the estimation more challenging. Yet pGVEM still outperforms the other methods with the regime of small sample size or low correlation. Furthermore, in terms of the recovery of the threshold parameter, pGVEM provides estimates with less bias and error. It is noteworthy that in this scenario, it occurs more frequently that for some item, there is no record of certain category from any individual, leading to us discarding such cases from the analysis. This suggests that when dealing with multiple categories of responses, the discrimination parameters may tend to be small for model interpretability.
In Fig. 9 we present the averaged computation time of the four methods under different settings of sample size N and test length J. The results exhibit similarity across the cases, and for brevity, we displayed the case where the discrimination parameters are simulated from Unif(0.5, 1) and correlation coefficient from Unif(0.1, 0.3). We take a total number of 6 categories. It is obvious that, compared with pGVEM, the traditional EM algorithm is inefficient especially when the sample size is large. Also, pGVEM is slightly faster than the stochastic EM algorithm and achieves similar computation efficiency compared with MH-RM algorithm in the displayed case. The computational efficiency of our pGVEM algorithm makes it possible to provide fast estimation on large datasets.
The results of our study demonstrate the superiority of the pGVEM algorithm over the traditional EM algorithm along with MH-RM and StEM in terms of parameter recovery and computational efficiency. Specifically, the pGVEM algorithm achieves comparable, and in some cases, superior parameter recovery compared to the other algorithms. Moreover, pGVEM generates fewer estimation outliers, particularly in situations where the sample size and test length are large.
As a side check, we compared the pGVEM algorithm with the GVEM algorithm proposed by Cho et al. (Reference Cho, Wang, Zhang and Xu2021) for the special case of M2PL. The comparison was made under identical setting in this section, except for that we take \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K{=}2$$\end{document} . The results, presented in Appendix B, indicate that our algorithm performs similarly to Cho et al. (Reference Cho, Wang, Zhang and Xu2021) with binary data. Overall, our findings demonstrate that the pGVEM algorithm is a robust and efficient method for parameter recovery for the MGPCM. It can also be applied to other models, including M2PL model, with similar success. The results suggest that the pGVEM algorithm may be a valuable tool for researchers seeking to analyze complex data structures efficiently and accurately.
3.2. Study II
We conducted simulation study II to assess the performance of the proposed bootstrap standard error (SE) estimation procedure. We explore the bootstrap estimation in the multidimensional case with multiple categories. In the multidimensional case, as clarified in Sect. 2.2, we found that the traditional methods (EM and MH-RM) exhibited instability and produced infeasible results across numerous settings. Consequently, in this section, we focus on the bootstrap-based SE estimates under the EM algorithm, MH-RM, and the proposed pGVEM algorithm.
The comparisons were conducted under the simulation setting similar to Simulation Study I and the manipulated factors include sample size, test length, factor correlations, and the number of categories. The empirical standard deviations of the estimated item parameters as
across replications per condition are considered as the approximations of true SEs for each method. Here v stands for item parameters to represent \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{jd}$$\end{document} or \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{jk}$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{v}}^{(r)}$$\end{document} is the estimated parameter in the rth replicate. To assess the performance of the proposed method, we computed SE estimations along with their bias and relative bias as follows:
providing a comprehensive assessment of the reliability and accuracy of the proposed bootstrap methods. Here we present the SEs of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{b}}$$\end{document} pooled together with the aim of showing the effectiveness of our method compared with its alternatives. The results for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{b}}$$\end{document} are similar to the pooled ones. The variability of discrimination and threshold parameters has been well illustrated in the previous study.
The results are shown in Fig. 10 for the low correlation setting and 11 for the high correlation setting. In the multidimensional case, where most of the methods fail to estimate the SEs numerically or from Fisher’s information matrix, the bootstrap method still generates stable results. In comparison with bootstrap methods from alternative estimations, the pGVEM-based bootstrap exhibits a lower bias. It is observed that when the sample size is 200, the pGVEM-based bootstrap may slightly underestimate standard errors. Nevertheless, its overall performance remains relatively strong across diverse settings.
3.3. Study III
In this section we conduct a simulation study to examine the performance of the proposed AIC and BIC in determining the latent dimension. Considering the complexity of the model setting of MGPCM, we explore the accuracy of factor identification in sample size \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=500,800$$\end{document} and test length \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J=20,40$$\end{document} . We consider both low correlation settings (simulated from Unif(0.1, 0.3)) and high correlation settings (simulated from Unif(0.5, 0.7)) for dimensions \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D=2,3,4$$\end{document} . In each configuration, a total number of 100 independent samples were generated and we recorded the number of cases where the number of factors was correctly identified. Discrimination parameters are generated from Unif(1, 2).
Tables 1 and 2 present the correct estimation rates for the number of dimensions of 3 and 6 categories, respectively. The results indicate that an increase in sample size generally contributes to higher correct estimation rates. We can also observe that a lower correlation generally leads to higher correct estimation rates, especially when there are fewer categories and shorter test lengths, as the latent structure may be more challenging to identify in such settings. Overall, we observe that AIC performs slightly better in the case of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=3$$\end{document} and BIC has an advantage in the regime of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=6$$\end{document} . In conclusion, the proposed criterion is efficient in general, and with a larger sample size, the model is more likely to be correctly identified. Our findings are also consistent with existing studies under the MIRT model (Cho et al., Reference Cho, Wang, Zhang and Xu2021).
4. Real Data Analysis
4.1. Trend in International Mathematics and Science Study Dataset
In this section, we demonstrate the application of the pGVEM algorithm by analyzing a dataset from the Trend in International Mathematics and Science Study (TIMSS) (Mullis & Martin, 2017; Martin & Mullis, Reference Martin and Mullis2019; Fishbein et al., Reference Fishbein, Martin, Mullis and Foy2018; Martin et al., 2020). TIMSS provides reliable and timely trend data on the mathematics and science achievement of US students compared to that of students in other countries. The assessment consists of a large pool of mathematics and science questions, which are divided into different blocks using the item matrix sampling design to relieve response burden. Specifically, 14 booklets were assembled, and each student was required to complete one of them. Different booklets may include the same items for linking. TIMSS 2019 divided the test items into 28 blocks for each grade, with 13 mathematics items and 15 science items. In the data of students in grade eight, each block contained 12 to 18 items. For our analysis, we selected a mathematics and a science block that appeared in booklet 5 of grade 8. In the mathematics block there were 2 polytomous items, and in the science there were 3. Of the 1,252 students who responded to these booklets, 918 students’ responses were completely recorded. In this study, we only estimated the parameters using the data from these 918 students. Appendix C provides details of the item code and corresponding test content. The IRT parameters provided in the TIMSS assessment document (Martin & Mullis, Reference Martin and Mullis2019) were used as the true parameters, to which our estimated parameters were compared.
It should be noted that in operational analysis of TIMSS, uni-dimensional IRT models were used for math and science domains separately. When analyzing the items from math and science domain separately, the modified information criterion in Equation (2.14) and (2.15) computed under the EFA framework both attain the smallest value when the latent dimension is 1 (i.e., \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D=1$$\end{document} ), which implies that both domains are essentially uni-dimensional under the generalized partial credit model setting. However, according to the information criterion provided in the ‘mirt’ package (i.e., using its default EM algorithm), the latent dimension cannot be clearly decided. In the following we display the parameters estimated by EM with fixed quadrature and pGVEM. The results are as follows.
To show the results visually, we plot the estimated parameters, by EM and pGVEM, in Figs. 12 and 13. Based on our analysis, it can be inferred that the b-parameters obtained from both methods exhibit a significant level of proximity. Similarly, the a-parameters demonstrate close values. However, it is worth noting that the estimates derived from the EM algorithm tend to be slightly larger for the majority of the items.
To reveal the relation of latent ability across different domains, we estimate all 28 items jointly. First, compute the modified information criterion in Equation (2.14) and (2.15) for the setting of joint estimating the data from mathematics and science block, with results presented in Table 3 and 4. From the results the optimal number of latent dimensions is 2. This result is reasonable as math proficiency and science proficiency should emerge as two separate factors. Yet the information criterion output from ‘mirt’ package again did not provide sensible results. By assuming the latent dimension is 2, we present the results estimated by the two methods after promax rotation in Fig. 14. The analysis of estimated parameters reveals a clear two-factor loading structure as shown in Fig. 14. Notably, the second dimension emerges as primarily the students’ proficiency in solving mathematical problems, whereas the first dimension tends to measure students’ science proficiency. This finding suggests that the two dimensions capture distinct but interconnected aspects of overall cognitive ability. Moreover, we can infer that certain types of questions are particularly effective in assessing students’ latent abilities in either mathematics or science. These questions demonstrate a higher degree of typicality in evaluating students’ competence within their respective domains. This insight underscores the importance of carefully selecting assessment items that align with the targeted cognitive skills, as they offer a more accurate reflection of students’ underlying abilities.
Math, Sci, Joint denote, respectively, information criterion for math items, science items and the collection of all items jointly.
Math, Sci, Joint denote, respectively, information criterion for math items, science items and the collection of all items jointly.
4.2. Big-Five Factor Personality Dataset
We further demonstrate the application of the pGVEM algorithm by analyzing a dataset from the Big-Five personality assessment. The five-factor model (FFM) stands as the most widely recognized and extensively used model in psychology for understanding and measuring personality (Goldberg, Reference Goldberg1992). This model provides a framework to capture the richness and complexity of individual differences, assuming five domains that encompass a comprehensive spectrum of personality traits. The five dimensions include Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism (often referred to by the acronym OCEAN) (Costa & McCrae, Reference Costa and McCrae2008).
To measure the latent traits, various assessment tools have been developed to assess an individual’s standing on the Big-Five dimensions (Wiggins & Trapnell, 1997; McCrae et al., Reference McCrae, Zonderman, Costa, Bond and Paunonen1996; Goldberg et al., Reference Goldberg1999). In our study, we employ the Big-Five Factor Markers derived from the International Personality Item Pool (IPIP), a widely recognized instrument developed by Goldberg (Reference Goldberg1992). The dataset is publicly available at http://openpsychometrics.org/tests/IPIP-BFFM/. This dataset consists of responses from a substantial sample of 19,718 individuals, each evaluated on the fifty-item Big-Five Factor Markers. The IPIP consists of fifty items, each requiring respondents to rate the extent to which they perceive each statement as true about themselves on a five-point scale. This scale ranges from 1 (Disagree) to 3 (Neutral) to 5 (Agree), providing a nuanced and graded assessment across the five categories.
Under the setting of exploratory factor analysis, we fit the MGPCM using the proposed pGVEM algorithm. The estimated factor loadings are represented in Fig. 15. The heatmap reveals that each factor exhibits a distinct and salient association with the grouped items. Also, we can see that not all latent factors have a positive influence on the overall rating—a phenomenon commonly observed in personality tests (McCrae et al., Reference McCrae, Zonderman, Costa, Bond and Paunonen1996; Goldberg et al., Reference Goldberg1999). A significant correlation between Factor 1 and Factor 3 can be observed, as they both significantly impact responses to the A-items. To further illustrate these relationships, we present the estimated correlation matrix of the factors in Table 5. The largest correlation is between Factor 1 and Factor 3 with a value of 0.529, which is consistent with our observation from Fig. 15. Overall, the estimated factor structure aligns with existing literature, demonstrating the usefulness of the proposed method as a computationally efficient tool to analyze large scale assessment data.
5. Conclusion
In this paper, we proposed a new Gausssian variational estimation algorithm of polytomous responses (namely, pGVEM) for the multidimensional generalized partial credit model (MGPCM). The MGPCM is one of the most widely used models in cognitive assessment and educational measurement for items that are scored polytomously, such as assigning partial scores for intermediate correct steps. There are limited methods for an efficient estimation of MGPCM, and it is shown that our pGVEM algorithm outperforms many existing approaches in that it is relatively stable and much faster without resulting in much aberrant estimates or convergence issues. When the sample size and test length are both large, our proposed variational lower bound seems to approximate the target marginal likelihood closely. The computation efficiency is achieved by replacing the intractable high-dimensional integral with a variational lower bound that contributes to faster EM-type updates involving only small-scale linear equations. The simulation study in Sect. 3 provides simulation evidence to support pGVEM in producing accurate parameter estimates quickly, as compared to the traditional EM implementation of marginal maximum likelihood estimators. The real data analysis demonstrates that our estimation scheme is capable of extracting proper information about the latent variables.
Variational inference has emerged as a prominent and efficient methodology in the field of psychometrics, particularly due to its ability to handle large-scale datasets with both accuracy and computational efficiency. In addition to IRT models with continuous latent variables, Yamaguchi and Okada (Reference Yamaguchi and Okada2020) recently proposed a variational Bayesian (VB) inference algorithm for the saturated cognitive diagnosis models, which represents a notable advancement in scalable and computationally efficient Bayesian estimation for discrete latent variable models. Oka and Okada (Reference Oka and Okada2023) developed a scalable estimation algorithm for the DINA Q-matrix, which employs an iteration scheme utilizing stochastic optimization and variational inference. Our method, on the other hand, extends the literature on continuous latent variable estimation (Cho et al., Reference Cho, Wang, Zhang and Xu2021, 2022) by considering multiple response categories. It is encouraging to further explore the relationship of various variational approximation methods, which may lead to a more robust and flexible estimation framework for many psychometric models.
There are also some other potential directions to extend the current work. First, as with many nonconvex optimization problems, the efficacy of our algorithm may be influenced by the chosen initial values. Instances where initial values deviate substantially from the actual parameters may lead the algorithm to converge to local optima rather than the global one. Given the propensity for variational approximations to yield multiple local optimal values, investigating the initialization strategy for our estimation process warrants further exploration. Second, the current method of determining the number of latent dimensions may not work for certain cases, especially when the dimension is very high or there exists a high correlation between latent factors. Therefore, a more accurate and robust model selection method that will work in these challenging scenarios is needed.
Acknowledgements
This work is partially supported by IES grant R305D200015 and NSF grants SES-1846747 and SES-2150601.
Declarations
Conflict of interest
The authors declare that there is no conflict of interest.
Data Availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.