Hostname: page-component-745bb68f8f-b95js Total loading time: 0 Render date: 2025-01-07T08:46:29.594Z Has data issue: false hasContentIssue false

A Note on Likelihood Ratio Tests for Models with Latent Variables

Published online by Cambridge University Press:  01 January 2025

Yunxiao Chen*
Affiliation:
London School of Economics and Political Science
Irini Moustaki
Affiliation:
London School of Economics and Political Science
Haoran Zhang
Affiliation:
Fudan University
*
Correspondence should be made to Yunxiao Chen, Department of Statistics, London School of Economics and Political Science, London, UK. Email: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

The likelihood ratio test (LRT) is widely used for comparing the relative fit of nested latent variable models. Following Wilks’ theorem, the LRT is conducted by comparing the LRT statistic with its asymptotic distribution under the restricted model, a χ2 distribution with degrees of freedom equal to the difference in the number of free parameters between the two nested models under comparison. For models with latent variables such as factor analysis, structural equation models and random effects models, however, it is often found that the χ2 approximation does not hold. In this note, we show how the regularity conditions of Wilks’ theorem may be violated using three examples of models with latent variables. In addition, a more general theory for LRT is given that provides the correct asymptotic theory for these LRTs. This general theory was first established in Chernoff (J R Stat Soc Ser B (Methodol) 45:404–413, 1954) and discussed in both van der Vaart (Asymptotic statistics, Cambridge, Cambridge University Press, 2000) and Drton (Ann Stat 37:979–1012, 2009), but it does not seem to have received enough attention. We illustrate this general theory with the three examples.

Type
Theory and Methods
Creative Commons
Creative Common License - CCCreative Common License - BY
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Copyright
Copyright © 2020 The Author(s)

1. Introduction

1.1. Literature on Likelihood Ratio Test

The likelihood ratio test (LRT) is one of the most popular methods for comparing nested models. When comparing two nested models that satisfy certain regularity conditions, the p-value of an LRT is obtained by comparing the LRT statistic with a χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution with degrees of freedom equal to the difference in the number of free parameters between the two nested models. This reference distribution is suggested by the asymptotic theory of LRT that is known as Wilks’ theorem (Wilks Reference Wilks1938).

However, for the statistical inference of models with latent variables (e.g., factor analysis, item factor analysis for categorical data, structural equation models, random effects models, finite mixture models), it is often found that the χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} approximation suggested by Wilks’ theorem does not hold. There are various published studies showing that the LRT is not valid under certain violations/conditions (e.g., small sample size, wrong model under the alternative hypothesis, large number of items, non-normally distributed variables, unique variances equal to zero, lack of identifiability), leading to over-factoring and over-rejections; see, e.g., Hakstian et al. (Reference Hakstian, Rogers and Cattell1982), Liu and Shao (Reference Liu and Shao2003), Hayashi et al. (Reference Hayashi, Bentler and Yuan2007), Asparouhov and Muthén (Reference Asparouhov and Muthén2009), Wu and Estabrook (Reference Wu and Estabrook2016), Deng et al. (Reference Deng, Yang and Marcoulides2018), Shi et al. (Reference Shi, Lee and Terry2018), Yang et al. (Reference Yang, Jiang and Yuan2018) and Auerswald and Moshagen (Reference Auerswald and Moshagen2019). There is also a significant amount of the literature on the effect of testing at the boundary of parameter space that arise when testing the significance of variance components in random effects models as well as in structural equation models (SEM) with linear or nonlinear constraints (see Stram and Lee Reference Stram and Lee1994, Reference Stram and Lee1995; Dominicus et al. Reference Dominicus, Skrondal, Gjessing, Pedersen and Palmgren2006; Savalei and Kolenikov Reference Savalei and Kolenikov2008; Davis-Stober Reference Davis-Stober2009; Wu and Neale Reference Wu and Neale2013; Du and Wang Reference Du and Wang2020).

Theoretical investigations have shown that certain regularity conditions of Wilks’ theorem are not always satisfied when comparing nested models with latent variables. Takane et al. (Reference Takane, van der Heijden, Browne, Higuchi, Iba and Ishiguro2003) and Hayashi et al. (Reference Hayashi, Bentler and Yuan2007) were among the ones who pointed out that models for which one needs to select dimensionality (e.g., principal component analysis, latent class, factor models) have points of irregularity in their parameter space that in some cases invalidate the use of LRT. Specifically, such issues arise in factor analysis when comparing models with different number of factors rather than comparing a factor model against the saturated model. The LRT for comparing a q-factor model against the saturated model does follow a χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution under mild conditions. However, for nested models with different number of factors (q-factor model is the correct one against the one with ( q + k ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(q+k)$$\end{document} factors), the LRT is likely not χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} -distributed due to violation of one or more of the regularity conditions. This is in line with the two basic assumptions required by the asymptotic theory for factor analysis and SEM: the identifiability of the parameter vector and non-singularity of the information matrix (see Shapiro Reference Shapiro1986 and references therein). More specifically, Hayashi et al. (Reference Hayashi, Bentler and Yuan2007) focus on exploratory factor analysis and on the problem that arises when the number of factors exceeds the true number of factors that might lead to rank deficiency and non-identifiability of model parameters. That corresponds to the violations of the two regularity conditions. Those findings go back to Geweke and Singleton (Reference Geweke and Singleton1980) and Amemiya and Anderson (Reference Amemiya and Anderson1990). More specifically, Geweke and Singleton (Reference Geweke and Singleton1980) studied the behavior of the LRT in small samples and concluded that when the regularity conditions from Wilks’ theorem are not satisfied the asymptotic theory seems to be misleading in all sample sizes considered.

1.2. Our Contributions

The contribution of this note is twofold. First, we provide a discussion about situations under which Wilks’ theorem for LRT may fail. Via three examples, we provide a relatively more complete picture about this issue in models with latent variables. Second, we introduce a unified asymptotic theory for LRT that covers Wilks’ theorem as a special case and provides the correct asymptotic reference distribution for LRT when Wilks’ theorem fails. This unified theory does not seem to have received enough attention in psychometrics, even though it has been established in statistics for long (Chernoff Reference Chernoff1954; van der Vaart Reference van der Vaart2000; Drton Reference Drton2009). In this note, we provide a tutorial on this theory, by presenting the theorems in a more accessible way and providing illustrative examples.

1.3. Examples

To further illustrate the issue with the classical theory for LRT, we provide three examples. These examples suggest that the χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} approximation can perform poorly and give p-values that can be either more conservative or more liberal.

Example 1

(Exploratory factor analysis) Consider a dimensionality test in exploratory factor analysis (EFA). For ease of exposition, we consider two hypothesis testing problems: (a) testing a one-factor model against a two-factor model and (b) testing a one-factor model against a saturated multivariate normal model with an unrestricted covariance matrix. Similar examples have been considered in Hayashi et al. (Reference Hayashi, Bentler and Yuan2007) where similar phenomena have been studied.

1(a). Suppose that we have J mean-centered continuous indicators, X = ( X 1 , . . . , X J ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {X}}= (X_1, ..., X_J)^\top $$\end{document} , which follow a J-variate normal distribution N ( 0 , Σ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N({\mathbf {0}}, {\varvec{\Sigma }})$$\end{document} . The one-factor model parameterizes Σ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Sigma }}$$\end{document} as

Σ = a 1 a 1 + Δ , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\varvec{\Sigma }} = {\mathbf {a}}_1{\mathbf {a}}_1^\top + {\varvec{\Delta }}, \end{aligned}$$\end{document}

where a 1 = ( a 11 , . . . , a J 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {a}}_1 = (a_{11}, ..., a_{J1})^\top $$\end{document} contains the loading parameters and Δ = d i a g ( δ 1 , . . . , δ J ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Delta }} = diag(\delta _1, ..., \delta _J)$$\end{document} is diagonal matrix with a diagonal entries δ 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _1$$\end{document} , ..., δ J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _J$$\end{document} . Here, Δ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Delta }}$$\end{document} is the covariance matrix for the unique factors. Similarly, the two-factor model parameterizes Σ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Sigma }}$$\end{document} as

Σ = a 1 a 1 + a 2 a 2 + Δ , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\varvec{\Sigma }} = {\mathbf {a}}_1{\mathbf {a}}_1^\top + {\mathbf {a}}_2{\mathbf {a}}_2^\top + {\varvec{\Delta }}, \end{aligned}$$\end{document}

where a 2 = ( a 12 , . . . , a J 2 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {a}}_2 = (a_{12}, ..., a_{J2})^\top $$\end{document} contains the loading parameters for the second factor and we set a 12 = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{12} = 0$$\end{document} to ensure model identifiability. Obviously, the one-factor model is nested within the two-factor model. The comparison between these two models is equivalent to test

H 0 : a 2 = 0 versus H a : a 2 0 . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} H_0: {\mathbf {a}}_2 = {\mathbf {0}} \text{ versus } H_a: {\mathbf {a}}_2 \ne {\mathbf {0}}. \end{aligned}$$\end{document}

If Wilks’ theorem holds, then under H 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_0$$\end{document} the LRT statistic should asymptotically follow a χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution with J - 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J-1$$\end{document} degrees of freedom.

Table 1. Values of the true parameters for the simulations in Example 1.

Figure 1. a Results of Example 1(a). The black solid line shows the empirical CDF of the LRT statistic, based on 5000 independent simulations. The red dashed line shows the CDF of the χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution with 5 degrees of freedom as suggested by Wilks’ theorem. The blue dotted line shows the CDF of the reference distribution suggested by Theorem 2. b Results of Example 1(b). The black solid line shows the empirical CDF of the LRT statistic, and the red dashed line shows the CDF of the χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution with 9 degrees of freedom as suggested by Wilks’ theorem

We now provide a simulated example. Data are generated from a one-factor model, with J = 6 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J = 6$$\end{document} indicators and N = 5000 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=5000$$\end{document} observations. The true parameter values are given in Table 1. We generate 5000 independent datasets. For each dataset, we compute the LRT for comparing the one- and two-factor models. Results are presented in panel (a) of Fig. 1. The black solid line shows the empirical cumulative distribution function (CDF) of the LRT statistic, and the red dashed line shows the CDF of the χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution suggested by Wilks’ Theorem. A substantial discrepancy can be observed between the two CDFs. Specifically, the χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} CDF tends to stochastically dominate the empirical CDF, implying that p-values based on this χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution tend to be more liberal. In fact, if we reject H 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_0$$\end{document} at 5% significance level based on these p-values, the actual type I error is 10.8%. These results suggest the failure of Wilks’ theorem in this example.

1(b). When testing the one-factor model against the saturated model, the LRT statistic is asymptotically χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} if Wilks’ theorem holds. The degrees of freedom of the χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution are J ( J + 1 ) / 2 - 2 J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J(J+1)/2 - 2J$$\end{document} , where J ( J + 1 ) / 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J(J+1)/2$$\end{document} is the number of free parameters in an unrestricted covariance matrix Σ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Sigma }}$$\end{document} and 2J is the number of parameters in the one-factor model. In panel (b) of Fig. 1, the black solid line shows the empirical CDF of the LRT statistic based on 5000 independent simulations, and the red dashed line shows the CDF of the χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution with 9 degrees of freedom. As we can see, the two curves almost overlap with each other, suggesting that Wilks’ theorem holds here.

Example 2

(Exploratory item factor analysis) We further give an example of exploratory item factor analysis (IFA) for binary data, in which similar phenomena as those in Example 1 are observed. Again, we consider two hypothesis testing problems: (a) testing a one-factor model against a two-factor model and (b) testing a one-factor model against a saturated multinomial model for a binary random vector.

2(a). Suppose that we have a J-dimensional response vector, X = ( X 1 , . . . , X J ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {X}}= (X_1, ..., X_J)^\top $$\end{document} , where all the entries are binary-valued, i.e., X j { 0 , 1 } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_j \in \{0, 1\}$$\end{document} . It follows a categorical distribution, satisfying

P ( X = x ) = π x , x { 0 , 1 } J , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} P({\mathbf {X}}= {\mathbf {x}}) = \pi _{{\mathbf {x}}}, {\mathbf {x}}\in \{0,1\}^J, \end{aligned}$$\end{document}

where π x 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{\mathbf {x}}\ge 0$$\end{document} and x { 0 , 1 } J π x = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum _{{\mathbf {x}}\in \{0, 1\}^J} \pi _{{\mathbf {x}}} = 1$$\end{document} .

The exploratory two-factor IFA model parameterizes π x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{\mathbf {x}}$$\end{document} by

π x = j = 1 J exp ( x j ( d j + a j 1 ξ 1 + a j 2 ξ 2 ) ) 1 + exp ( d j + a j 1 ξ 1 + a j 2 ξ 2 ) ϕ ( ξ 1 ) ϕ ( ξ 2 ) d ξ 1 d ξ 2 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \pi _{{\mathbf {x}}} = \int \int \prod _{j=1}^J \frac{\exp (x_j(d_j + a_{j1}\xi _1 + a_{j2}\xi _2))}{1+\exp (d_j + a_{j1}\xi _1 + a_{j2}\xi _2)} \phi (\xi _1)\phi (\xi _2)d\xi _1d\xi _2, \end{aligned}$$\end{document}

where ϕ ( · ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi (\cdot )$$\end{document} is the probability density function of a standard normal distribution. This model is also known as a multidimensional two-parameter logistic (M2PL) model (Reckase Reference Reckase2009). Here, a jk \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{jk}$$\end{document} s are known as the discrimination parameters and d j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_j$$\end{document} s are known as the easiness parameters. We denote a 1 = ( a 11 , . . . , a J 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {a}}_1 = (a_{11},...,a_{J1})^\top $$\end{document} and a 2 = ( a 12 , . . . , a J 2 ) . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {a}}_2 = (a_{12},...,a_{J2})^\top .$$\end{document} For model identifiability, we set a 12 = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{12} = 0$$\end{document} . When a j 2 = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{j2} = 0$$\end{document} , j = 2 , . . . , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=2, ..., J$$\end{document} , then the two-factor model degenerates to the one-factor model. Similar to Example 1(a), if Wilks’ theorem holds, the LRT statistic should asymptotically follow a χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution with J - 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J-1$$\end{document} degrees of freedom.

Simulation results suggest the failure of this χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} approximation. In Fig. 2, we provide plots similar to those in Fig. 1, based on 5000 datasets simulated from a one-factor IFA model with sample size N = 5000 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N = 5000$$\end{document} and J = 6 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J = 6$$\end{document} . The true parameters of this IFA model are given in Table 2. The result is shown in panel (a) of Fig. 2, where a similar pattern is observed as that in panel (a) of Fig. 1 for Example 1(a).

Table 2. Values of the true parameters for the simulations in Example 2.

Figure 2. a Results of Example 2(a). The black solid line shows the empirical CDF of the LRT statistic, based on 5000 independent simulations. The red dashed line shows the CDF of the χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution with 5 degrees of freedom as suggested by Wilks’ theorem. The blue dotted line shows the CDF of the reference distribution suggested by Theorem 2. b Results of Example 2(b). The black solid line shows the empirical CDF of the LRT statistic, and the red dashed line shows the CDF of the χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution with 51 degrees of freedom as suggested by Wilks’ theorem

2(b). When testing the one-factor IFA model against the saturated model, the LRT statistic is asymptotically χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} if Wilks’ theorem holds, for which the degree of freedom is 2 J - 1 - 2 J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^J-1 - 2J$$\end{document} . Here, 2 J - 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^J-1$$\end{document} is the number of free parameters in the saturated model, and 2J is the number of parameters in the one-factor IFA model. The result is given in panel (b) of Fig. 2. Similar to Example 1(b), the empirical CDF and the CDF implied by Wilks’ theorem are very close to each other, suggesting that Wilks’ theorem holds here.

Example 3

(Random effects model) Our third example considers a random intercept model. Consider two-level data with individuals at level 1 nested within groups at level 2. Let X ij \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{ij}$$\end{document} be data from the jth individual from the ith group, where i = 1 , . . . , N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i = 1, ..., N$$\end{document} and j = 1 , . . . , J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j = 1, ..., J$$\end{document} . For simplicity, we assume all the groups have the same number of individuals. Assume the following random effects model,

X ij = β 0 + μ i + ϵ ij , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} X_{ij} = \beta _0 + \mu _i + \epsilon _{ij}, \end{aligned}$$\end{document}

where β 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _0$$\end{document} is the overall mean across all the groups, μ i N ( 0 , σ 1 2 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _i \sim N(0, \sigma _1^2)$$\end{document} characterizes the difference between the mean for group i and the overall mean, and ϵ ij N ( 0 , σ 2 2 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon _{ij} \sim N(0, \sigma _2^2)$$\end{document} is the individual level residual.

To test for between-group variability under this model is equivalent to test

H 0 : σ 1 2 = 0 against H a : σ 1 2 > 0 . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} H_0: \sigma _1^2 = 0 \text{ against } H_a: \sigma _1^2 > 0. \end{aligned}$$\end{document}

If Wilks’ theorem holds, then the LRT statistic should follow a χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution with one degree of freedom. We conduct a simulation study and show the results in Fig. 3. In this figure, the black solid line shows the empirical CDF of the LRT statistic, based on 5000 independent simulations from the null model with N = 200 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N = 200$$\end{document} , J = 20 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J= 20$$\end{document} , β 0 = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _0 = 0$$\end{document} , and σ 2 2 = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2_2 = 1$$\end{document} . The red dashed line shows the CDF of the χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution with one degree of freedom. As we can see, the two CDFs are not close to each other, and the empirical CDF tends to stochastically dominate the theoretical CDF suggested by Wilks’ theorem. It suggests the failure of Wilks’ theorem in this example.

This kind of phenomenon has been observed when the null model lies on the boundary of the parameter space, due to which the regularity conditions of Wilks’ theorem do not hold. The LRT statistic has been shown to often follow a mixture of χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution asymptotically (e.g., Shapiro Reference Shapiro1985; Self and Liang Reference Self and Liang1987), instead of a χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution. As it will be shown in Sect. 2, such a mixture of χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution can be derived from a general theory for LRT.

Figure 3. The black solid line shows the empirical CDF of the LRT statistic, based on 5000 independent simulations. The red dashed line shows the CDF of the χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution with one degree of freedom as suggested by Wilks’ theorem. The blue dotted line shows the CDF of the mixture of χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution suggested by Theorem 2 (Color figure online)

We now explain why Wilks’ theorem does not hold in Examples 1(a), 2(a), and 3. We define some generic notations. Suppose that we have i.i.d. observations X 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {X}}_1$$\end{document} , ..., X N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {X}}_N$$\end{document} , from a parametric model P Θ = { P θ : θ Θ R k } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {P}}_{\Theta } = \{P_{{\varvec{\theta }}}: {\varvec{\theta }} \in \Theta \subset {\mathbb {R}}^k\}$$\end{document} , where X i = ( X i 1 , . . . , X iJ ) . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {X}}_i = (X_{i1}, ..., X_{iJ})^\top .$$\end{document} We assume that the distributions in P Θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal P_{\Theta }$$\end{document} are dominated by a common σ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma $$\end{document} -finite measure ν \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu $$\end{document} with respect to which they have probability density functions p θ : R J [ 0 , ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{{\varvec{\theta }}}: {\mathbb {R}}^J \rightarrow [0,\infty )$$\end{document} . Let Θ 0 Θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta _0 \subset \Theta $$\end{document} be a submodel and we are interested in testing

H 0 : θ Θ 0 versus H a : θ Θ \ Θ 0 . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} H_0: {\varvec{\theta }}\in \Theta _0 \text{ versus } H_a: {\varvec{\theta }}\in \Theta {\setminus } \Theta _0. \end{aligned}$$\end{document}

Let p θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{{\varvec{\theta }}^* }$$\end{document} be the true model for the observations, where θ Θ 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}^* \in \Theta _0$$\end{document} .

The likelihood function is defined as

l N ( θ ) = i = 1 N log p θ ( X i ) , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} l_N({\varvec{\theta }}) = \sum _{i=1}^N \log p_{{\varvec{\theta }}}({\mathbf {X}}_i), \end{aligned}$$\end{document}

and the LRT statistic is defined as

λ N = 2 sup θ Θ l N ( θ ) - sup θ Θ 0 l N ( θ ) . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \lambda _N = 2\left( \sup _{{\varvec{\theta }}\in \Theta } l_N({\varvec{\theta }}) - \sup _{{\varvec{\theta }}\in \Theta _0} l_N({\varvec{\theta }})\right) . \end{aligned}$$\end{document}

Under suitable regularity conditions, Wilks’ theorem suggests that the LRT statistic λ N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _N$$\end{document} is asymptotically χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} .

Wilks’ theorem for LRT requires several regularity conditions; see, e.g., Theorem 12.4.2, Lehmann and Romano (Reference Lehmann and Romano2006). Among these conditions, there are two conditions that the previous examples do not satisfy. First, it is required that θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}^*$$\end{document} is an interior point of Θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta $$\end{document} . This condition is not satisfied for Example 3, when Θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta $$\end{document} is taken to be { ( β 0 , σ 1 2 , σ 2 2 ) : β 0 R , σ 1 2 [ 0 , ) , σ 2 2 [ 0 , ) } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{(\beta _0, \sigma _1^2, \sigma _2^2): \beta _0 \in {\mathbb {R}}, \sigma _1^2 \in [0, \infty ), \sigma _2^2 \in [0, \infty )\}$$\end{document} , as the null model lies on the boundary of the parameter space. Second, it is required that the expected Fisher information matrix at θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}^*$$\end{document} , I ( θ ) = E θ [ l N ( θ ) l N ( θ ) ] / N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I({\varvec{\theta }}^*) = E_{{\varvec{\theta }}^*}[\nabla l_{N}({\varvec{\theta }}^*)\nabla l_{N}({\varvec{\theta }}^*)^\top ]/N$$\end{document} is strictly positive definite. As we summarize in Lemma 1, this condition is not satisfied in Examples 1(a) and 2(a), when Θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta $$\end{document} is taken to be the parameter space of the corresponding two-factor model. However, interestingly, when comparing the one-factor model with the saturated model, the Fisher information matrix is strictly positive definite in Examples 1(b) and 2(b), for both simulated examples.

Lemma 1

  1. (1) For the two-factor model given in Example 1(a), choose the parameter space to be

    Θ = ( δ 1 , . . . , δ J , a 11 , . . . , a J 1 , a 22 , . . . , a J 2 ) R 3 J - 1 : δ j > 0 , j = 1 , . . . , J . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \Theta = \left\{ (\delta _1,...,\delta _J, a_{11},...,a_{J1},a_{22},...,a_{J2})^\top \in {\mathbb {R}}^{3J-1}:\delta _j >0, ~j=1,...,J \right\} . \end{aligned}$$\end{document}
    If the true parameters satisfy a j 2 = 0 , j = 2 , . . . , J , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a^*_{j2}=0, ~j=2,...,J,$$\end{document} then I ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I({\varvec{\theta }}^*)$$\end{document} is non-invertible.
  2. (2) For the two-factor IFA model given in Example 2(a), choose the parameter space to be Θ = R 3 J - 1 . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \Theta = {\mathbb {R}}^{3J-1}. $$\end{document} If the true parameters satisfy a j 2 = 0 , j = 2 , . . . , J , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a^*_{j2}=0, ~j=2,...,J,$$\end{document} then I ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I({\varvec{\theta }}^*)$$\end{document} is non-invertible.

We remark on the consequences of having a non-invertible information matrix. The first consequence is computational. If the information matrix is non-invertible, then the likelihood function does not tend to be strongly convex near the MLE, resulting in slow convergence. In the context of Examples 1(a) and 2(a), it means that computing the MLE for the corresponding two-factor models may have convergence issue. When convergence issue occurs, the obtained LRT statistic is below its actual value, due to the log likelihood for the two-factor model not achieving the maximum. Consequently, the p-value tends to be larger than its actual value, and thus, the decision based on the p-value tends to be more conservative than the one without convergence issue. This convergence issue is observed when conducting simulations for these examples. To improve the convergence, we use multiple random starting points when computing MLEs. The second consequence is a poor asymptotic convergence rate for the MLE. That is, the convergence rate is typically much slower than the standard parametric rate N - 1 / 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N^{-1/2}$$\end{document} , even though the MLE is still consistent; see Rotnitzky et al. (Reference Rotnitzky, Cox, Bottai and Robins2000) for more theoretical results on this topic.

We further provide some remarks on the LRT in Examples 1(b) and 2(b) that use a LRT for comparing the fitted model with the saturated model. Although Wilks’ theorem holds asymptotically in example 2(b), the χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} approximation may not always work well as in our simulated example. This is because, when the number of items becomes larger and the sample size is not large enough, the contingency table for all 2 J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^J$$\end{document} response patterns may be sparse, and thus, the saturated model cannot be accurately estimated. In that case, it is better to use a limited information inference method (e.g., Maydeu-Olivares and Joe Reference Maydeu-Olivares and Joe2005, Reference Maydeu-Olivares and Joe2006) as a goodness-of-fit test statistic. Similar issues might also occur to Example 1(b).

2. General Theory for Likelihood Ratio Test

The previous discussions suggest that Wilks’ theorem does not hold for Examples 1(a), 2(a), and 3, due to the violation of regularity conditions. It is then natural to ask: What asymptotic distribution does λ N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _N$$\end{document} follow in these situations? Is there asymptotic theory characterizing such irregular situations? The answer to these questions is “yes.” In fact, a general theory characterizing these less regular situations has already been established in Chernoff (Reference Chernoff1954). In what follows, we provide a version of this general theory that is proved in van der Vaart (Reference van der Vaart2000), Theorem 16.7. It is also given in Drton (Reference Drton2009), Theorem 2.6. Two problems will be considered, (1) comparing a submodel with the saturated model as in Examples 1(b) and 2(b), and (2) comparing two submodels as in Examples 1(a), 2(a), and 3.

2.1. Testing Submodel Against Saturated Model

We first introduce a few notations. We use R pd J × J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {R}}^{J\times J}_{pd}$$\end{document} and R d J × J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {R}}^{J\times J}_d$$\end{document} to denote the spaces of J × J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J\times J$$\end{document} strictly positive definite matrices and diagonal matrices, respectively. In addition, we define a one-to-one mapping ρ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho $$\end{document} : R pd J × J R J ( J + 1 ) / 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {\mathbb {R}}_{pd}^{J\times J} \mapsto {\mathbb {R}}^{J(J+1)/2}$$\end{document} that maps a positive definite matrix to a vector containing all its upper triangular entries (including the diagonal entries). That is, ρ ( Σ ) = ( σ 11 , σ 12 . . . , σ 1 J , σ 22 , . . . , σ 2 J , . . . , σ JJ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho (\Sigma ) = (\sigma _{11}, \sigma _{12}..., \sigma _{1J}, \sigma _{22}, ..., \sigma _{2J}, ..., \sigma _{JJ})^\top $$\end{document} , for Σ = ( σ ij ) J × J R pd J × J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Sigma = (\sigma _{ij})_{J\times J} \in \mathcal {\mathbb {R}}_{pd}^{J\times J}$$\end{document} . We also define a one-to-one mapping μ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document} : R d J × J R J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {R}}_{d}^{J\times J} \mapsto {\mathbb {R}}^{J}$$\end{document} that maps a diagonal matrix to a vector containing all its diagonal entries.

We consider to compare a submodel versus the saturated model. Let Θ 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta _0$$\end{document} and Θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta $$\end{document} be the parameter spaces of the submodel and the saturated model, respectively, satisfying Θ 0 Θ R k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta _0 \subset \Theta \subset {\mathbb {R}}^k$$\end{document} . Also let θ Θ 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}^* \in \Theta _0$$\end{document} be the true parameter vector. The asymptotic theory of the LRT for comparing Θ 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta _0$$\end{document} versus Θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta $$\end{document} requires regularity conditions C1-C5.

  1. C1. The true parameter vector θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}^*$$\end{document} is in the interior of Θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta $$\end{document} .

  2. C2. There exists a measurable map l ˙ θ : R J R k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\dot{l}_{{\varvec{\theta }}}:{\mathbb {R}}^J \rightarrow {\mathbb {R}}^k$$\end{document} such that

    (1) lim h 0 1 h 2 R J p θ + h ( x ) - p θ ( x ) - 1 2 h l ˙ θ ( x ) p θ ( x ) 2 d ν ( x ) = 0 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \lim _{{\mathbf {h}}\rightarrow \mathbf{0}} \frac{1}{\Vert {\mathbf {h}}\Vert ^2} \int _{{\mathbb {R}}^J} \left( \sqrt{p_{{\varvec{\theta }}+{\mathbf {h}}}({\mathbf {x}})} - \sqrt{p_{{\varvec{\theta }}}({\mathbf {x}})} - \frac{1}{2}{\mathbf {h}}^\top \dot{l}_{{\varvec{\theta }}}({\mathbf {x}})\sqrt{p_{{\varvec{\theta }}}({\mathbf {x}})} \right) ^2 d\nu ({\mathbf {x}}) = 0, \end{aligned}$$\end{document}
    and the Fisher information matrix I ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I({\varvec{\theta }}^*)$$\end{document} for P Θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {P}}_{\Theta }$$\end{document} is invertible.
  3. C3. There exists a neighborhood of θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}^*$$\end{document} , U θ Θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$U_{{\varvec{\theta }}^*} \subset \Theta $$\end{document} , and a measurable function l ˙ : R J R \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\dot{l}: {\mathbb {R}}^J \rightarrow {\mathbb {R}}$$\end{document} , square integrable as R J l ˙ ( x ) 2 d P θ ( x ) < , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\int _{{\mathbb {R}}^J}\dot{l}({\mathbf {x}})^2 dP_{{\varvec{\theta }}^*}({\mathbf {x}}) < \infty ,$$\end{document} such that

    | log p θ 1 ( x ) - log p θ 2 ( x ) | l ˙ ( x ) θ 1 - θ 2 , θ 1 , θ 2 U θ . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \vert \log p_{{\varvec{\theta }}_1}({\mathbf {x}}) - \log p_{{\varvec{\theta }}_2}({\mathbf {x}}) \vert \le \dot{l}({\mathbf {x}}) \Vert {\varvec{\theta }}_1 - {\varvec{\theta }}_2 \Vert , \quad \forall {\varvec{\theta }}_1,{\varvec{\theta }}_2 \in U_{{\varvec{\theta }}^*}. \end{aligned}$$\end{document}
  4. C4. The maximum likelihood estimators (MLEs)

    θ ^ N , Θ = arg max θ Θ l N ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {{\hat{{\varvec{\theta }}}}}_{N, \Theta } = \mathop {\mathrm{arg}\, \mathrm{max}}\limits _{{\varvec{\theta }}\in \Theta } l_N({\varvec{\theta }}) \end{aligned}$$\end{document}
    and
    θ ^ N , Θ 0 = arg max θ Θ 0 l N ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {{\hat{{\varvec{\theta }}}}}_{N, \Theta _0} = \mathop {\mathrm{arg}\, \mathrm{max}}\limits _{{\varvec{\theta }}\in \Theta _0} l_N({\varvec{\theta }}) \end{aligned}$$\end{document}
    are consistent under P θ . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P_{{\varvec{\theta }}^*}.$$\end{document}

The asymptotic distribution of λ N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _N$$\end{document} depends on the local geometry of the parameter space Θ 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta _0$$\end{document} at θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}^*$$\end{document} . This is characterized by the tangent cone T Θ 0 ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\Theta _0}({\varvec{\theta }}^*)$$\end{document} , to be defined below.

Definition 1

The tangent cone T Θ 0 ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\Theta _0}({\varvec{\theta }}^*)$$\end{document} of the set Θ 0 R k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta _0 \subset {\mathbb {R}}^k$$\end{document} at the point θ R k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}^* \in {\mathbb {R}}^k$$\end{document} is the set of vectors in R k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {R}}^k$$\end{document} that are limits of sequences α n ( θ n - θ ) , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _n({\varvec{\theta }}_n - {\varvec{\theta }}^*),$$\end{document} where α n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _n$$\end{document} are positive reals and θ n Θ 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}_n \in \Theta _0$$\end{document} converge to θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}^*$$\end{document} .

The following regularity is required for the tangent cone T Θ 0 ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\Theta _0}({\varvec{\theta }}^*)$$\end{document} that is known as the Chernoff regularity.

  1. C5. For every vector τ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\tau }$$\end{document} in the tangent cone T Θ 0 ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\Theta _0}({\varvec{\theta }}^*)$$\end{document} , there exist ϵ > 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon >0$$\end{document} and a map α : [ 0 , ϵ ) Θ 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\alpha }}:[0,\epsilon )\rightarrow \Theta _0$$\end{document} with α ( 0 ) = θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\alpha }}(0) = {\varvec{\theta }}^*$$\end{document} such that τ = lim t 0 + [ α ( t ) - α ( 0 ) ] / t . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\tau }= \lim _{t\rightarrow 0+}[{\varvec{\alpha }}(t) - {\varvec{\alpha }}(0)]/t.$$\end{document}

Under the above regularity conditions, Theorem 1 holds and explains the phenomena in Examples 1(b) and 2(b).

Theorem 1

Suppose that conditions C1-C5 are satisfied for comparing nested models Θ 0 Θ R k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta _0 \subset \Theta \subset {\mathbb {R}}^k$$\end{document} , with θ Θ 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}^* \in \Theta _0$$\end{document} being the true parameter vector. Then as N grows to infinity, the likelihood ratio statistic λ N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _N$$\end{document} converges to the distribution of

(2) min τ T Θ 0 ( θ ) Z - I ( θ ) 1 2 τ 2 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \min _{{\varvec{\tau }} \in T_{\Theta _0}({\varvec{\theta }}^*)} \Vert {\mathbf {Z}} - I({\varvec{\theta }}^*)^{\frac{1}{2}}{\varvec{\tau }} \Vert ^2, \end{aligned}$$\end{document}

where Z = ( Z 1 , . . . , Z k ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {Z}} = (Z_1, ..., Z_k)^\top $$\end{document} is a random vector consisting of i.i.d. standard normal random variables.

Remark 1

We give some remarks on the regularity conditions. Conditions C1-C4 together ensure the asymptotic normality for N ( θ ^ N , Θ - θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{N}({\hat{{\varvec{\theta }}}}_{N,\Theta }-{\varvec{\theta }}^*)$$\end{document} . Condition C1 depends on both the true model and the saturated model. As will be shown below, this condition holds for the saturated models in Examples 1(b) and 2(b). Equation (1) in C2 is also known as the condition of “differentiable in quadratic mean” for P Θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {P}}_{\Theta }$$\end{document} at θ . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}^*.$$\end{document} If the map θ p θ ( x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}\mapsto \sqrt{p_{{\varvec{\theta }}}({\mathbf {x}})}$$\end{document} is continuously differentiable for every x , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {x}},$$\end{document} then C2 holds with l ˙ θ ( x ) = θ log p θ ( x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\dot{l}_{{\varvec{\theta }}}({\mathbf {x}}) = \frac{\partial }{\partial {\varvec{\theta }}}\log p_{{\varvec{\theta }}}({\mathbf {x}})$$\end{document} (Lemma 7.6, van der Vaart (Reference van der Vaart2000)). Furthermore, C3 holds if l ˙ ( x ) = sup θ U θ l ˙ θ ( x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\dot{l}({\mathbf {x}}) = \sup _{{\varvec{\theta }}\in U_{{\varvec{\theta }}^*}}\dot{l}_{{\varvec{\theta }}}({\mathbf {x}})$$\end{document} is square integrable with respect to the measure P θ . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P_{{\varvec{\theta }}^*}.$$\end{document} Specifically, if l ˙ ( x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\dot{l}({\mathbf {x}})$$\end{document} is a bounded function, then C3 holds. C4 holds for our examples by Theorem 10.1.6, Casella and Berger (Reference Casella and Berger2002). C5 requires certain regularity on the local geometry of T Θ 0 ( θ ) , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\Theta _0}({\varvec{\theta }}^*),$$\end{document} which also holds for our examples below.

Remark 2

By Theorem 1, the asymptotic distribution for λ N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _N$$\end{document} depends on the tangent cone T Θ 0 ( θ ) . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\Theta _0}({\varvec{\theta }}^*).$$\end{document} If T Θ 0 ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\Theta _0}({\varvec{\theta }}^*)$$\end{document} is a linear subspace of R k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {R}}^k$$\end{document} with dimension k 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_0$$\end{document} , then one can easily show that the asymptotic reference distribution of λ N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _N$$\end{document} is χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} with degrees of freedom k - k 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k-k_0$$\end{document} . As we explain below, Theorem 1 directly applies to Examples 1(b) and 2(b). If T Θ 0 ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\Theta _0}({\varvec{\theta }}^*)$$\end{document} is a convex cone, then λ N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _N$$\end{document} converges to a mixture of χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution (Shapiro Reference Shapiro1985; Self and Liang Reference Self and Liang1987). That is, for any x > 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x > 0$$\end{document} , P ( λ N x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P(\lambda _N \le x)$$\end{document} converges to i = 0 k w k P ( ξ i x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum _{i=0}^k w_k P(\xi _i \le x)$$\end{document} , as N goes to infinity, where ξ 0 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi _0 \equiv 0$$\end{document} and ξ i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi _i$$\end{document} follows a χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution with i degrees of freedom for i > 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i > 0$$\end{document} . Moreover, the weights sum up to 1/2 for the components with even degrees of freedom, and so do the weights for the components with odd degrees of freedom (Shapiro Reference Shapiro1985).

Example 4

(Exploratory factor analysis, revisited) Now we consider Example 1(b). As the saturated model is a J-variate normal distribution with an unrestricted covariance matrix, its parameter space can be chosen as

Θ = { ρ ( Σ ) : Σ R pd J × J } R J ( J + 1 ) / 2 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \Theta = \{ \rho ({\varvec{\Sigma }}) : {\varvec{\Sigma }} \in {\mathbb {R}}_{pd}^{J\times J} \} \subset {\mathbb {R}}^{J(J+1)/2}, \end{aligned}$$\end{document}

and the parameter space for the restricted model is

Θ 0 = ρ ( Σ ) : Σ = a 1 a 1 + Δ , a 1 R J , Δ R pd J × J R d J × J . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \Theta _0 = \left\{ \rho ({\varvec{\Sigma }}): {\varvec{\Sigma }} = {\mathbf {a}}_1{\mathbf {a}}_1^\top + {\varvec{\Delta }},~ {\mathbf {a}}_1 \in {\mathbb {R}}^J, {\varvec{\Delta }} \in {\mathbb {R}}_{pd}^{J\times J} \cap {\mathbb {R}}_{d}^{J\times J} \right\} . \end{aligned}$$\end{document}

Suppose θ = ρ ( Σ ) Θ 0 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}^* = \rho ({\varvec{\Sigma }}^*) \in \Theta _0,$$\end{document} where Σ = a 1 a 1 + Δ . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Sigma }}^* = {{\mathbf {a}}_1^*}{{\mathbf {a}}_1^*}^\top + {\varvec{\Delta }}^*.$$\end{document} It is easy to see that C1 holds with the current choice of Θ . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta .$$\end{document} The tangent cone T Θ 0 ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\Theta _0}({\varvec{\theta }}^*)$$\end{document} takes the form:

T Θ 0 ( θ ) = ρ ( Σ ) : Σ = a 1 b 1 + b 1 a 1 + B , b 1 R J , B R d J × J , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} T_{\Theta _0}({\varvec{\theta }}^*) = \left\{ \rho ({\varvec{\Sigma }}): {\varvec{\Sigma }} = {{\mathbf {a}}_1^*}{{\mathbf {b}}}_1^\top + {{\mathbf {b}}}_1{{\mathbf {a}}_1^*}^\top + {\mathbf {B}}, ~ {{\mathbf {b}}}_1 \in {\mathbb {R}}^{J}, {\mathbf {B}} \in {\mathbb {R}}_{d}^{J\times J} \right\} , \end{aligned}$$\end{document}

which is a linear subspace of R J ( J + 1 ) / 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {R}}^{J(J+1)/2}$$\end{document} with dimension 2J, as long as a j 1 0 , j = 1 , . . . , J . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a^*_{j1} \ne 0, ~j=1,...,J.$$\end{document} By Theorem 1, λ N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _N$$\end{document} converges to the χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution with degrees of freedom J ( J + 1 ) / 2 - 2 J . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J(J+1)/2 - 2J.$$\end{document}

Example 5

(Exploratory item factor analysis, revisited) Now we consider Example 2(b). As the saturated model is a 2 J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^J$$\end{document} -dimensional categorical distribution, its parameter space can be chosen as

Θ = θ = { θ x } x Γ J : θ x 0 , x Γ J θ x 1 R 2 J - 1 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \Theta = \left\{ {\varvec{\theta }}= \{\theta _{{\mathbf {x}}}\}_{{\mathbf {x}}\in \Gamma _J}: \theta _{{\mathbf {x}}} \ge 0, \sum _{{\mathbf {x}}\in \Gamma _J}\theta _{{\mathbf {x}}} \le 1 \right\} \subset {\mathbb {R}}^{2^J - 1}, \end{aligned}$$\end{document}

where Γ J : = { 0 , 1 } J \ { ( 0 , . . . , 0 ) } . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Gamma _J := \{0,1\}^J \backslash \{(0,...,0)^\top \}.$$\end{document} Then, the parameter space for the restricted model is

(3) Θ 0 = θ Θ : θ x = j = 1 J exp ( x j ( d j + a j 1 ξ 1 ) ) 1 + exp ( d j + a j 1 ξ 1 ) ϕ ( ξ 1 ) d ξ 1 , a 1 , d R J . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \begin{aligned} \Theta _0 = \left\{ {\varvec{\theta }}\in \Theta : \theta _{{\mathbf {x}}} = \int \prod _{j=1}^J \frac{\exp (x_{j}(d_j + a_{j1}\xi _1))}{1+\exp (d_j + a_{j1}\xi _1)} \phi (\xi _1)d\xi _1, ~ {\mathbf {a}}_1,{{\mathbf {d}}}\in {\mathbb {R}}^J \right\} . \end{aligned} \end{aligned}$$\end{document}

Let θ Θ 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}^* \in \Theta _0$$\end{document} that corresponds to true item parameters a 1 = ( a j 1 , . . . , a J 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {a}}_1^* = (a^*_{j1},...,a^*_{J1})^\top $$\end{document} and d = ( d 1 , . . . , d J ) . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathbf {d}}}^* = (d^*_{1},...,d^*_{J})^\top .$$\end{document} By the form of Θ 0 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta _0,$$\end{document} θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}^*$$\end{document} is an interior point of Θ . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta .$$\end{document}

For any x Γ J , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {x}}\in \Gamma _J,$$\end{document} we define f x = ( f 1 ( x ) , . . . , f J ( x ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {f}}_{\mathbf {x}}= (f_{1}({\mathbf {x}}),...,f_{J}({\mathbf {x}}))^\top $$\end{document} and g x = ( g 1 ( x ) , . . . , g J ( x ) ) , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {g}}_{{\mathbf {x}}} = (g_{1}({\mathbf {x}}),...,g_{J}({\mathbf {x}}))^\top ,$$\end{document} where

f l ( x ) = j = 1 J exp ( x j ( d j + a j 1 ξ 1 ) ) 1 + exp ( d j + a j 1 ξ 1 ) x l - exp ( d l + a l 1 ξ 1 ) 1 + exp ( d l + a l 1 ξ 1 ) ϕ ( ξ 1 ) d ξ 1 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} f_{l}({\mathbf {x}}) = \int \prod _{j=1}^J \frac{\exp (x_{j}(d^*_{j} + a^*_{j1}\xi _1))}{1+\exp (d^*_{j} + a^*_{j1}\xi _1)} \left[ x_l - \frac{\exp (d^*_{l} + a^*_{l1}\xi _1)}{1+\exp (d^*_{l} + a^*_{l1}\xi _1)} \right] \phi (\xi _1) d\xi _1, \end{aligned}$$\end{document}

and

g l ( x ) = j = 1 J exp ( x j ( d j + a j 1 ξ 1 ) ) 1 + exp ( d j + a j 1 ξ 1 ) x l - exp ( d l + a l 1 ξ 1 ) 1 + exp ( d l + a l 1 ξ 1 ) ξ 1 ϕ ( ξ 1 ) d ξ 1 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} g_{l}({\mathbf {x}}) = \int \prod _{j=1}^J \frac{\exp (x_{j}(d^*_{j} + a^*_{j1}\xi _1))}{1+\exp (d^*_{j} + a^*_{j1}\xi _1)} \left[ x_l - \frac{\exp (d^*_{l} + a^*_{l1}\xi _1)}{1+\exp (d^*_{l} + a^*_{l1}\xi _1)} \right] \xi _1\phi (\xi _1) d\xi _1, \end{aligned}$$\end{document}

for l = 1 , . . . , J . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l=1,...,J.$$\end{document} Then the tangent cone T Θ 0 ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\Theta _0}({\varvec{\theta }}^*)$$\end{document} has the form

T Θ 0 ( θ ) = θ = { θ x } x Γ J : θ x = b 0 f x + b 1 g x , b 0 , b 1 R J , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} T_{\Theta _0}({\varvec{\theta }}^*) = \left\{ {\varvec{\theta }}= \{\theta _{{\mathbf {x}}}\}_{{\mathbf {x}}\in \Gamma _J}: \theta _{{\mathbf {x}}} = {{\mathbf {b}}}_0^\top {\mathbf {f}}_{{\mathbf {x}}} + {{\mathbf {b}}}_1^\top {\mathbf {g}}_{{\mathbf {x}}},~{{\mathbf {b}}}_0,{{\mathbf {b}}}_1\in {\mathbb {R}}^J \right\} , \end{aligned}$$\end{document}

which is a linear subspace of R 2 J - 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {R}}^{2^J-1}$$\end{document} with dimension 2J. By Theorem 1, λ N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _N$$\end{document} converges to the distribution of χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} with degrees of freedom 2 J - 1 - 2 J . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^{J}-1 - 2J.$$\end{document}

2.2. Comparing Two Nested Submodels

Theorem 1 is not applicable to Example 3, because θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}^*$$\end{document} is on the boundary of Θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta $$\end{document} if Θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta $$\end{document} is chosen to be { ( β 0 , σ 1 2 , σ 2 2 ) : β 0 R , σ 1 2 [ 0 , ) , σ 2 2 [ 0 , ) } , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{(\beta _0, \sigma _1^2, \sigma _2^2): \beta _0 \in {\mathbb {R}}, \sigma _1^2 \in [0, \infty ), \sigma _2^2 \in [0, \infty )\},$$\end{document} and thus, C1 is violated. Theorem 1 is also not applicable to Examples 1(a) and 2(a), because the Fisher information matrix is not invertible when Θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta $$\end{document} is chosen to be the parameter space of the two-factor EFA and IFA models, respectively, in which case condition C2 is violated.

To derive the asymptotic theory for such problems, we view them as a problem of testing nested submodels under a saturated model for which θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}^*$$\end{document} is an interior point of Θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta $$\end{document} and the information matrix is invertible. Consider testing

H 0 : θ Θ 0 versus H a : θ Θ 1 \ Θ 0 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} H_0: {\varvec{\theta }}\in \Theta _0 \text{ versus } H_a: {\varvec{\theta }}\in \Theta _1{\setminus } \Theta _0, \end{aligned}$$\end{document}

where Θ 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta _0$$\end{document} and Θ 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta _1$$\end{document} are two nested submodels of a saturated model Θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta $$\end{document} , satisfying Θ 0 Θ 1 Θ R k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta _0 \subset \Theta _1 \subset \Theta \subset {\mathbb {R}}^{k}$$\end{document} . Under this formulation, Theorem 2 provides the asymptotic theory for the LRT statistic λ N = 2 sup θ Θ 1 l N ( θ ) - sup θ Θ 0 l N ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _N = 2\left( \sup _{{\varvec{\theta }}\in \Theta _1} l_N({\varvec{\theta }}) - \sup _{{\varvec{\theta }}\in \Theta _0} l_N({\varvec{\theta }})\right) $$\end{document} .

To obtain the asymptotic distribution of λ N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _N$$\end{document} , regularity conditions C1-C5 are still required for Θ 0 Θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta _0 \subset \Theta $$\end{document} . Two additional conditions are needed for Θ 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta _1$$\end{document} , which are satisfied for Examples 6, 7 and 8.

  1. C6. The MLE under Θ 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta _1$$\end{document} , θ ^ N , Θ 1 = arg max θ Θ 1 l N ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\hat{{\varvec{\theta }}}}}_{N, \Theta _1} = \mathop {\mathrm{arg}\, \mathrm{max}}\limits _{{\varvec{\theta }}\in \Theta _1} l_N({\varvec{\theta }})$$\end{document} , is consistent under P θ . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P_{{\varvec{\theta }}^*}.$$\end{document}

  2. C7. Let T Θ 1 ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\Theta _1}({\varvec{\theta }}^*)$$\end{document} be the tangent cone for Θ 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta _1$$\end{document} , defined the same as in Definition 1, but with Θ 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta _0$$\end{document} replaced by Θ 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta _1$$\end{document} . T Θ 1 ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\Theta _1}({\varvec{\theta }}^*)$$\end{document} satisfies Chernoff regularity. That is, for every vector τ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\tau }$$\end{document} in the tangent cone T Θ 1 ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\Theta _1}({\varvec{\theta }}^*)$$\end{document} there exist ϵ > 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon >0$$\end{document} and a map α : [ 0 , ϵ ) Θ 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\alpha }}:[0,\epsilon )\rightarrow \Theta _1$$\end{document} with α ( 0 ) = θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\alpha }}(0) = {\varvec{\theta }}^*$$\end{document} such that τ = lim t 0 + [ α ( t ) - α ( 0 ) ] / t . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\tau }= \lim _{t\rightarrow 0+}[{\varvec{\alpha }}(t) - {\varvec{\alpha }}(0)]/t.$$\end{document}

Theorem 2

Let θ Θ 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}^* \in \Theta _0$$\end{document} be the true parameter vector. Suppose that conditions C1-C7 are satisfied. As N grows to infinity, the likelihood ratio statistic λ N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _N$$\end{document} converges to the distribution of

(4) min τ T Θ 0 ( θ ) Z - I ( θ ) 1 2 τ 2 - min τ T Θ 1 ( θ ) Z - I ( θ ) 1 2 τ 2 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \begin{aligned} \min _{{\varvec{\tau }} \in T_{\Theta _0}({\varvec{\theta }}^*)} \Vert {\mathbf {Z}} - I({\varvec{\theta }}^*)^{\frac{1}{2}}{\varvec{\tau }} \Vert ^2 - \min _{{\varvec{\tau }} \in T_{\Theta _1}({\varvec{\theta }}^*)} \Vert {\mathbf {Z}} - I({\varvec{\theta }}^*)^{\frac{1}{2}}{\varvec{\tau }} \Vert ^2, \end{aligned} \end{aligned}$$\end{document}

where Z = ( Z 1 , . . . , Z k ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {Z}} = (Z_1, ..., Z_k)^\top $$\end{document} is a random vector consisting of i.i.d. standard normal random variables, and I ( θ ) 1 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I({\varvec{\theta }}^*)^{\frac{1}{2}}$$\end{document} satisfies I ( θ ) 1 2 ( I ( θ ) 1 2 ) = I ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I({\varvec{\theta }}^*)^{\frac{1}{2}} (I({\varvec{\theta }}^*)^{\frac{1}{2}})^\top = I({\varvec{\theta }}^*)$$\end{document} that can be obtained by eigenvalue decomposition.

Example 6

(Random effects model, revisited) Now we consider Example 3. Let 1 n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathbf {1}}}_n$$\end{document} denote a length-n vector whose entries are all 1 and I n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathbf {I}}}_n$$\end{document} denote the n × n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\times n$$\end{document} identity matrix. As X i = ( X i 1 , . . . , X iJ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {X}}_i = (X_{i1},...,X_{iJ})^\top $$\end{document} from the random effects model is multivariate normal with mean β 0 1 J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _0{{\mathbf {1}}}_J$$\end{document} and covariance matrix σ 1 2 1 J 1 J + σ 2 2 I J , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _1^2{{\mathbf {1}}}_J{{\mathbf {1}}}_J^\top + \sigma _2^2{{\mathbf {I}}}_J,$$\end{document} the saturated parameter space can be taken as

Θ = { ( ρ ( Σ ) , β 0 ) : Σ R pd J × J , β 0 R } . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \Theta = \{ (\rho ({\varvec{\Sigma }})^\top ,\beta _0)^\top : {\varvec{\Sigma }} \in {\mathbb {R}}^{J\times J}_{pd}, \beta _0\in {\mathbb {R}}\}. \end{aligned}$$\end{document}

The parameter space for restricted models are

Θ 0 = { ( ρ ( Σ ) , β 0 ) : Σ = σ 2 2 I J , σ 2 2 > 0 , β 0 R } , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \Theta _0 = \{ (\rho ({\varvec{\Sigma }})^\top ,\beta _0)^\top :{\varvec{\Sigma }} = \sigma _2^2{{\mathbf {I}}}_J,~ \sigma _2^2 >0, \beta _0\in {\mathbb {R}}\}, \end{aligned}$$\end{document}

and

Θ 1 = { ( ρ ( Σ ) , β 0 ) : Σ = σ 1 2 1 J 1 J + σ 2 2 I J , σ 1 2 0 , σ 2 2 > 0 , β 0 R } . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \Theta _1 = \{ (\rho ({\varvec{\Sigma }})^\top ,\beta _0)^\top :{\varvec{\Sigma }} = \sigma _1^2{{\mathbf {1}}}_J{{\mathbf {1}}}_J^\top + \sigma _2^2{{\mathbf {I}}}_J, \sigma _1^2\ge 0, \sigma _2^2>0, \beta _0\in {\mathbb {R}}\}. \end{aligned}$$\end{document}

Let θ = ( ρ ( Σ ) , β 0 ) Θ 0 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}^* = (\rho ({\varvec{\Sigma }}^*),\beta ^*_0) \in \Theta _0,$$\end{document} where Σ = σ 2 2 I J . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Sigma }}^* = {\sigma _2^*}^2{{\mathbf {I}}}_J.$$\end{document} Then, C1 holds. The tangent cones for Θ 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta _0$$\end{document} and Θ 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta _1$$\end{document} are

T Θ 0 ( θ ) = { ( ρ ( Σ ) , b 0 ) : Σ = b 2 I J , b 0 , b 2 R } \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} T_{\Theta _0}({\varvec{\theta }}^*) = \{ (\rho ({\varvec{\Sigma }})^\top ,b_0)^\top :{\varvec{\Sigma }} = b_2{{\mathbf {I}}}_J,~ b_0,b_2\in {\mathbb {R}}\} \end{aligned}$$\end{document}

and

T Θ 1 ( θ ) = { ( ρ ( Σ ) , b 0 ) : Σ = b 1 1 J 1 J + b 2 I J , b 1 0 , b 0 , b 2 R } . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} T_{\Theta _1}({\varvec{\theta }}^*) = \{ (\rho ({\varvec{\Sigma }})^\top ,b_0)^\top :{\varvec{\Sigma }} = b_1{{\mathbf {1}}}_J{{\mathbf {1}}}_J^\top + b_2{{\mathbf {I}}}_J,~ b_1\ge 0, b_0,b_2\in {\mathbb {R}}\}. \end{aligned}$$\end{document}

By Theorem 2, λ N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _N$$\end{document} converges to the distribution of (4).

In this example, the form of (4) can be simplified, thanks to the forms of T Θ 0 ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\Theta _0}({\varvec{\theta }}^*)$$\end{document} and T Θ 1 ( θ ) . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\Theta _1}({\varvec{\theta }}^*).$$\end{document} We denote

c 0 = ( 0 , . . . , 0 , 1 ) , c 1 = ( ρ ( 1 J 1 J ) , 0 ) , c 2 = ( ρ ( I J ) , 0 ) R J ( J + 1 ) / 2 + 1 . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\mathbf {c}}_0 = (0,...,0,1), ~{\mathbf {c}}_1 = (\rho ({{\mathbf {1}}}_J{{\mathbf {1}}}_J^\top )^\top ,0)^\top ,~{\mathbf {c}}_2 = (\rho ({{\mathbf {I}}}_J)^\top ,0)^\top \in {\mathbb {R}}^{J(J+1)/2+1}. \end{aligned}$$\end{document}

It can be seen that T Θ 0 ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\Theta _0}({\varvec{\theta }}^*)$$\end{document} is a two-dimensional linear subspace spanned by { c 0 , c 2 } , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{{\mathbf {c}}_0,{\mathbf {c}}_2\},$$\end{document} and T Θ 1 ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\Theta _1}({\varvec{\theta }}^*)$$\end{document} is a half three-dimensional linear subspace defined as { α 0 c 0 + α 1 c 1 + α 2 c 2 : α 1 0 , α 0 , α 2 R } . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\alpha _0{\mathbf {c}}_0+\alpha _1{\mathbf {c}}_1+\alpha _2{\mathbf {c}}_2: \alpha _1\ge 0,\alpha _0,\alpha _2\in {\mathbb {R}}\}.$$\end{document} Let P 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {P}}_0$$\end{document} denote the projection onto T Θ 0 ( θ ) . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\Theta _0}({\varvec{\theta }}^*).$$\end{document} Define

v = c 1 - P 0 c 1 c 1 - P 0 c 1 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\mathbf {v}}= \frac{{\mathbf {c}}_1 - {\mathbf {P}}_0{\mathbf {c}}_1}{\Vert {\mathbf {c}}_1 - {\mathbf {P}}_0{\mathbf {c}}_1\Vert }, \end{aligned}$$\end{document}

and then, (4) has the form

(5) v Z 2 1 { v Z 0 } . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \Vert {\mathbf {v}}^\top {\mathbf {Z}}\Vert ^21_{\{{\mathbf {v}}^\top {\mathbf {Z}}\ge 0\}}. \end{aligned}$$\end{document}

It is easy to see that v Z \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {v}}^\top {\mathbf {Z}}$$\end{document} follows standard normal distribution. Therefore, λ N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _N$$\end{document} converges to the distribution of w 2 1 { w 0 } , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w^21_{\{w\ge 0\}},$$\end{document} where w is a standard normal random variable. This is known as a mixture of χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution. The blue dotted line in Fig. 3 shows the CDF of this mixture χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution. This CDF is very close to the empirical CDF of the LRT, confirming our asymptotic theory.

Example 7

(Exploratory factor analysis, revisited) Now we consider Example 1(a). Let Θ , Θ 0 , θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta ,\Theta _0,{\varvec{\theta }}^*$$\end{document} and T Θ 0 ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\Theta _0}({\varvec{\theta }}^*)$$\end{document} be the same as those in Example 4. In addition, we define

Θ 1 = ρ ( Σ ) : Σ = a 1 a 1 + a 2 a 2 + Δ , a 1 , a 2 R J , a 12 = 0 , Δ R pd J × J R d J × J . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \Theta _1 = \left\{ \rho ({\varvec{\Sigma }}): {\varvec{\Sigma }} = {\mathbf {a}}_1{\mathbf {a}}_1^\top + {\mathbf {a}}_2{\mathbf {a}}_2^\top + {\varvec{\Delta }},~ {\mathbf {a}}_1,{\mathbf {a}}_2 \in {\mathbb {R}}^J, a_{12}=0, {\varvec{\Delta }} \in {\mathbb {R}}_{pd}^{J\times J} \cap {\mathbb {R}}_{d}^{J\times J} \right\} . \end{aligned}$$\end{document}

The tangent cone of Θ 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta _1$$\end{document} at θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}^*$$\end{document} becomes

T Θ 1 ( θ ) = ρ ( Σ ) : Σ = a 1 b 1 + b 1 a 1 + b 2 b 2 + B , b 1 , b 2 R J , b 12 = 0 , B R d J × J . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} T_{\Theta _1}({\varvec{\theta }}^*) = \left\{ \rho ({\varvec{\Sigma }}): {\varvec{\Sigma }} = {{\mathbf {a}}_1^*}{{\mathbf {b}}}_1^\top + {{\mathbf {b}}}_1{{\mathbf {a}}_1^*}^\top + {{\mathbf {b}}}_2{{\mathbf {b}}}_2^\top + {\mathbf {B}}, ~ {{\mathbf {b}}}_1,{{\mathbf {b}}}_2 \in {\mathbb {R}}^{J}, b_{12}=0, {\mathbf {B}} \in {\mathbb {R}}_{d}^{J\times J} \right\} . \end{aligned}$$\end{document}

Note that T Θ 1 ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\Theta _1}({\varvec{\theta }}^*)$$\end{document} is not a linear subspace, due to the b 2 b 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathbf {b}}}_2{{\mathbf {b}}}_2^\top $$\end{document} term. Therefore, by Theorem 2, the asymptotic distribution of λ N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _N$$\end{document} is not χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} . See the blue dotted line in Panel (a) of Fig. 1 for the CDF of this asymptotic distribution. This CDF almost overlaps with the empirical CDF of the LRT, suggesting that Theorem 2 holds here.

Example 8

(Exploratory item factor analysis, revisited) Now we consider Example 2(a). Let Θ , Θ 0 , θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta ,\Theta _0,{\varvec{\theta }}^*$$\end{document} and T Θ 0 ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\Theta _0}({\varvec{\theta }}^*)$$\end{document} be the same as those in Example 5. Let

Θ 1 = θ Θ : θ x = j = 1 J exp ( x j ( d j + a j 1 ξ 1 + a j 2 ξ 2 ) ) 1 + exp ( d j + a j 1 ξ 1 + a j 2 ξ 2 ) ϕ ( ξ 1 ) ϕ ( ξ 2 ) d ξ 1 d ξ 2 , a 12 = 0 , x Γ J \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \begin{aligned} \Theta _1 = \left\{ {\varvec{\theta }}\in \Theta : \theta _{{\mathbf {x}}} = \int \int \prod _{j=1}^J \frac{\exp (x_{j}(d_j + a_{j1}\xi _1+a_{j2}\xi _2))}{1+\exp (d_j + a_{j1}\xi _1+a_{j2}\xi _2)} \phi (\xi _1)\phi (\xi _2)d\xi _1d\xi _2, a_{12} = 0,{\mathbf {x}}\in \Gamma _J \right\} \end{aligned} \end{aligned}$$\end{document}

be the parameter space for the two-factor model. Recall f x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {f}}_{{\mathbf {x}}}$$\end{document} and g x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {g}}_{{\mathbf {x}}}$$\end{document} as defined in Example 5. For any x Γ J , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {x}}\in \Gamma _J,$$\end{document} we further define H x = ( h rs ( x ) ) J × J , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf {H}}_{{\mathbf {x}}} = (h_{rs}({\mathbf {x}}))_{J\times J},$$\end{document} where

h rs ( x ) = j = 1 J exp ( x j ( d j + a j 1 ξ 1 ) ) 1 + exp ( d j + a j 1 ξ 1 ) x r - exp ( d r + a r 1 ξ 1 ) 1 + exp ( d r + a r 1 ξ 1 ) × x s - exp ( d s + a s 1 ξ 1 ) 1 + exp ( d s + a s 1 ξ 1 ) ϕ ( ξ 1 ) d ξ 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \begin{aligned} h_{rs}({\mathbf {x}}) =&\int \prod _{j=1}^J \frac{\exp (x_{j}(d^*_{j} + a^*_{j1}\xi _1))}{1+\exp (d^*_{j} + a^*_{j1}\xi _1)} \left[ x_r - \frac{\exp (d^*_{r} + a^*_{r1}\xi _1)}{1+\exp (d^*_{r} + a^*_{r1}\xi _1)} \right] \\&\times \left[ x_s - \frac{\exp (d^*_{s} + a^*_{s1}\xi _1)}{1+\exp (d^*_{s} + a^*_{s1}\xi _1)} \right] \phi (\xi _1)d\xi _1 \end{aligned} \end{aligned}$$\end{document}

for r s , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r\ne s,$$\end{document} and

h rr ( x ) = j = 1 J exp ( x j ( d j + a j 1 ξ 1 ) ) 1 + exp ( d j + a j 1 ξ 1 ) x r - exp ( d r + a r 1 ξ 1 ) 1 + exp ( d r + a r 1 ξ 1 ) 2 - exp ( d r + a r 1 ξ 1 ) ( 1 + exp ( d r + a r 1 ξ 1 ) ) 2 ϕ ( ξ 1 ) d ξ 1 . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \begin{aligned} h_{rr}({\mathbf {x}}) =&\int \prod _{j=1}^J \frac{\exp (x_{j}(d^*_{j} + a^*_{j1}\xi _1))}{1+\exp (d^*_{j} + a^*_{j1}\xi _1)} \left\{ \left[ x_r - \frac{\exp (d^*_{r} + a^*_{r1}\xi _1)}{1+\exp (d^*_{r} + a^*_{r1}\xi _1)} \right] ^2 \right. \\&\left. - \frac{\exp (d_r^* + a_{r1}^*\xi _1)}{(1+\exp (d_r^* + a_{r1}^*\xi _1))^2} \right\} \phi (\xi _1)d\xi _1. \end{aligned} \end{aligned}$$\end{document}

Then, the tangent cone of Θ 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta _1$$\end{document} at θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}^*$$\end{document} is

(6) T Θ 1 ( θ ) = θ = { θ x } x Γ J : θ x = b 0 f x + b 1 g x + b 2 H x b 2 , b 0 , b 1 , b 2 R J , b 12 = 0 . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} T_{\Theta _1}({\varvec{\theta }}^*) = \left\{ {\varvec{\theta }}=\{\theta _{{\mathbf {x}}}\}_{{\mathbf {x}}\in \Gamma _J}: \theta _{{\mathbf {x}}} = {{\mathbf {b}}}_0^\top {\mathbf {f}}_{{\mathbf {x}}} + {{\mathbf {b}}}_1^\top {\mathbf {g}}_{{\mathbf {x}}} + {{\mathbf {b}}}_2^\top {\mathbf {H}}_{{\mathbf {x}}}{{\mathbf {b}}}_2,~ {{\mathbf {b}}}_0,{{\mathbf {b}}}_1,{{\mathbf {b}}}_2 \in {\mathbb {R}}^J, b_{12} = 0 \right\} . \end{aligned}$$\end{document}

Similar to Example 7, T Θ 1 ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\Theta _1}({\varvec{\theta }}^*)$$\end{document} is not a linear subspace, and thus, λ N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _N$$\end{document} is not asymptotically χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} . In Panel (a) of Fig. 2, the asymptotic CDF suggested by Theorem 2 is shown as the blue dotted line. Similar to the previously examples, this CDF is very close to the empirical CDF of the LRT.

3. Discussion

In this note, we point out how the regularity conditions of Wilks’ theorem may be violated, using three examples of models with latent variables. In these cases, the asymptotic distribution of the LRT statistic is no longer χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} , and therefore, the test may no longer be valid. It seems that the regularity conditions of Wilks’ theorem, especially the requirement on a non-singular Fisher information matrix, have not received enough attention. As a result, the LRT is often misused. Although we focus on LRT, it is worth pointing out that other testing procedures, including the Wald and score tests, as well as limited information tests (e.g., tests based on bivariate information), require similar regularity conditions and thus may also be affected.

We present a general theory for LRT first established in Chernoff (Reference Chernoff1954) that is not widely known in psychometrics and related fields. As we illustrate by the three examples, this theory applies to irregular cases not covered by Wilks’ theorem. There are other examples for which this general theory is useful. For example, Examples 1(a) and 2(a) can be easily generalized to the comparison of factor models with different numbers of factors, under both confirmatory and exploratory settings. This theory can also be applied to model comparison in latent class analysis that also suffers from a non-invertible information matrix. To apply the theorem, the key is to choose a suitable parameter space and then characterize the tangent cone at the true model.

There are alternative inference methods for making statistical inference under such irregular situations. One method is to obtain a reference distribution for LRT via parametric bootstrap. Under the same regularity conditions as in Theorem 2, we believe that the parametric bootstrap is still consistent. The parametric bootstrap may even achieve better approximation accuracy for finite sample data than the asymptotic distributions given by Theorems 1 and 2. However, for complex latent variable models (e.g., IFA models with many factors), the parametric bootstrap may be computationally intensive, due to the high computational cost of repeatedly computing the marginal maximum likelihood estimators. On the other hand, Monte Carlo simulation of the asymptotic distribution in Theorem 2 is computationally much easier, even though there are still optimizations to be solved. Another method is the split likelihood ratio test recently proposed by Wasserman et al. (Reference Wasserman, Ramdas and Balakrishnan2020) that is computationally fast and does not suffer from singularity or boundary issues. By making use of a sample splitting trick, this split LRT is able to control the type I error at any pre-specified level. However, it may be quite conservative sometimes.

This paper focuses on the situations where the true model is exactly a singular or boundary point of the parameter space. However, the LRT can also be problematic when the true model is near a singular or boundary point. A recent article by Mitchell et al. (Reference Mitchell, Allman and Rhodes2019) provides a treatment of this problem, where a finite sample approximating distribution is derived for LRT.

Besides the singularity and boundary issues, the asymptotic distribution may be inaccurate when the dimension of the parameter space is relatively high comparing with the sample size. This problem has been intensively studied in statistics and a famous result is the Bartlett correction which provides a way to improve the χ 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} approximation (Bartlett Reference Bartlett1937; Bickel and Ghosh Reference Bickel and Ghosh1990; Cordeiro Reference Cordeiro1983; Box Reference Box1949; Lawley Reference Lawley1956; Wald Reference Wald1943). When the regularity conditions do not hold, the classical form of Bartlett correction may no longer be suitable. A general form of Bartlett correction remains to be developed, which is left for future investigation.

Acknowledgements

The authors thank the editor, associate editor, and three reviewers for their supportive and insightful comments. Yunxiao Chen acknowledges the support from the National Academy of Education/Spencer Postdoctoral Fellowship.

Appendix

Proof of Lemma 1

Denote the (i, j)-entry of the Fisher information matrix I ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I({\varvec{\theta }}^*)$$\end{document} as q ij . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{ij}.$$\end{document} In both cases, we show that q ij = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{ij} = 0$$\end{document} for i 2 J + 1 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i \ge 2J+1,$$\end{document} or j 2 J + 1 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j \ge 2J+1,$$\end{document} and therefore, I ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I({{\varvec{\theta }}^*})$$\end{document} is non-invertible. Since

q ij = log p θ x θ i | θ log p θ x θ j | θ p θ ( x ) d x , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} q_{ij} = \int \frac{\partial \log p_{{\varvec{\theta }}}\left( {\mathbf {x}}\right) }{\partial \theta _i} \Bigr |_{{\varvec{\theta }}^*}\frac{\partial \log p_{{\varvec{\theta }}}\left( {\mathbf {x}}\right) }{\partial \theta _j}\Bigr |_{{\varvec{\theta }}^*} p_{{\varvec{\theta }}^*}({\mathbf {x}})d{\mathbf {x}}, \end{aligned}$$\end{document}

it suffices to show that

log p θ x θ i | θ = 0 , j 2 J + 1 . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{\partial \log p_{{\varvec{\theta }}}\left( {\mathbf {x}}\right) }{\partial \theta _i}\Bigr |_{{\varvec{\theta }}^*} = 0, \quad j \ge 2J+1. \end{aligned}$$\end{document}

In the case of two-factor model, it suffices to show that

log p θ x a l 2 | θ = 0 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{\partial \log p_{{\varvec{\theta }}}\left( {\mathbf {x}}\right) }{\partial a_{l2}} \Bigr |_{{\varvec{\theta }}^*} = 0, \end{aligned}$$\end{document}

for l = 2 , . . . , J . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l=2,...,J.$$\end{document} Let σ ij \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{ij}$$\end{document} be the (i, j)-entry of the covariance matrix Σ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Sigma $$\end{document} and it is easy to see that σ ij = a i 1 a j 1 + a i 2 a j 2 + 1 { i = j } δ i , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{ij} = a_{i1}a_{j1}+a_{i2}a_{j2}+1_{\{i=j\}}\delta _i,$$\end{document} where a 12 = 0 . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{12} = 0.$$\end{document} By the chain rule,

log p θ x a l 2 = i j log p θ x σ ij σ ij a l 2 . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{\partial \log p_{{\varvec{\theta }}}\left( {\mathbf {x}}\right) }{\partial a_{l2}} = \sum _{i\le j} \frac{\partial \log p_{{\varvec{\theta }}}\left( {\mathbf {x}}\right) }{\partial \sigma _{ij}} \frac{\partial \sigma _{ij}}{\partial a_{l2}}. \end{aligned}$$\end{document}

Since

σ ij a l 2 | θ = 1 { l = i } a j 2 + 1 { l = j } a i 2 = 0 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \begin{aligned} \frac{\partial \sigma _{ij}}{\partial a_{l2}} \Bigr |_{{\varvec{\theta }}^*}&= 1_{\{l=i\}}a^*_{j2} + 1_{\{l=j\}}a^*_{i2}\\&= 0, \end{aligned} \end{aligned}$$\end{document}

then I ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I({\varvec{\theta }}^*)$$\end{document} is non-invertible in the case of two-factor model.

In the case of two-factor IFA model, since

log p θ x θ i = 1 p θ ( x ) p θ x a l 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{\partial \log p_{{\varvec{\theta }}}\left( {\mathbf {x}}\right) }{\partial \theta _i} = \frac{1}{p_{{\varvec{\theta }}}({\mathbf {x}})}\frac{\partial p_{{\varvec{\theta }}}\left( {\mathbf {x}}\right) }{\partial a_{l2}} \end{aligned}$$\end{document}

it suffices to show that

p θ x a l 2 | θ = 0 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{\partial p_{{\varvec{\theta }}}\left( {\mathbf {x}}\right) }{\partial a_{l2}} \Bigr |_{{\varvec{\theta }}^*} = 0, \end{aligned}$$\end{document}

for l = 2 , . . . , J . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l=2,...,J.$$\end{document} Since

p θ x a l 2 | θ = j = 1 J exp ( x j ( d j + a j 1 ξ 1 ) ) 1 + exp ( d j + a j 1 ξ 1 ) x l - exp ( d l + a l 1 ξ 1 ) 1 + exp ( d l + a l 1 ξ 1 ) ξ 2 ϕ ( ξ 1 ) ϕ ( ξ 2 ) d ξ 1 d ξ 2 = ξ 2 ϕ ( ξ 2 ) d ξ 2 × j = 1 J exp ( x j ( d j + a j 1 ξ 1 ) ) 1 + exp ( d j + a j 1 ξ 1 ) x l - exp ( d l + a l 1 ξ 1 ) 1 + exp ( d l + a l 1 ξ 1 ) ϕ ( ξ 1 ) d ξ 1 = 0 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \begin{aligned} \frac{\partial p_{{\varvec{\theta }}}\left( {\mathbf {x}}\right) }{\partial a_{l2}} \Bigr |_{{\varvec{\theta }}^*}&= \int \int \prod _{j=1}^J \frac{\exp (x_{j}(d^*_{j} + a^*_{j1}\xi _1))}{1+\exp (d^*_{j} + a^*_{j1}\xi _1)} \left[ x_l - \frac{\exp (d^*_{l} + a^*_{l1}\xi _1)}{1+\exp (d^*_{l} + a^*_{l1}\xi _1)} \right] \xi _2\phi (\xi _1)\phi (\xi _2) d\xi _1d\xi _2\\&= \int \xi _2 \phi (\xi _2) d\xi _2 \times \int \prod _{j=1}^J \frac{\exp (x_{j}(d^*_{j} + a^*_{j1}\xi _1))}{1+\exp (d^*_{j} + a^*_{j1}\xi _1)} \left[ x_l - \frac{\exp (d^*_{l} + a^*_{l1}\xi _1)}{1+\exp (d^*_{l} + a^*_{l1}\xi _1)} \right] \phi (\xi _1) d\xi _1 \\&=0, \end{aligned} \end{aligned}$$\end{document}

then I ( θ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I({\varvec{\theta }}^*)$$\end{document} is non-invertible in the case of two-factor IFA model. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}

Footnotes

Electronic Supplementary material The online version of this article (https://doi.org/10.1007/s11336-020-09735-0) contains supplementary material, which is available to authorized users.

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Amemiya, Y., & Anderson, T. W.(1990). Asymptotic chi-square tests for a large class of factor analysis models.The Annals of Statistics, 18,14531463.CrossRefGoogle Scholar
Asparouhov, T., & Muthén, B.(2009). Exploratory structural equation modeling.Structural Equation Modeling, 16,397438.CrossRefGoogle Scholar
Auerswald, M., & Moshagen, M.(2019). How to determine the number of factors to retain in exploratory factor analysis: A comparison of extraction methods under realistic conditions.Psychological Methods, 24,468491.CrossRefGoogle ScholarPubMed
Bartlett, M. S.(1937). Properties of sufficiency and statistical tests.Proceedings of the Royal Society of London. Series A-Mathematical and Physical Sciences, 160,268282.Google Scholar
Bickel, P. J., & Ghosh, J.(1990). A decomposition for the likelihood ratio statistic and the Bartlett correction-a Bayesian argument.The Annals of Statistics, 18,10701090.CrossRefGoogle Scholar
Box, G. E.(1949). A general distribution theory for a class of likelihood criteria.Biometrika, 36,317346.CrossRefGoogle ScholarPubMed
Casella, G., & Berger, R. L. Statistical inference, (2002). Belmont, CA:Duxbury.Google Scholar
Chernoff, H.(1954). On the distribution of the likelihood ratio.The Annals of Mathematical Statistics, 25,573578.CrossRefGoogle Scholar
Cordeiro, G. M.(1983). Improved likelihood ratio statistics for generalized linear models.Journal of the Royal Statistical Society: Series B (Methodological), 45,404413.CrossRefGoogle Scholar
Davis-Stober, C. P.(2009). Analysis of multinomial models under inequality constraints: Applications to measurement theory.Journal of Mathematical Psychology, 53,113.CrossRefGoogle Scholar
Deng, L., Yang, M., & Marcoulides, K. M.(2018). Structural equation modeling with many variables: A systematic review of issues and developments.Frontiers in Psychology, 9,580.CrossRefGoogle ScholarPubMed
Dominicus, A., Skrondal, A., Gjessing, H. K., Pedersen, N. L., & Palmgren, J.(2006). Likelihood ratio tests in behavioral genetics: Problems and solutions.Behavior Genetics, 36,331340.CrossRefGoogle ScholarPubMed
Drton, M.(2009). Likelihood ratio tests and singularities.The Annals of Statistics, 37,9791012.CrossRefGoogle Scholar
Du, H., & Wang, L.(2020). Testing variance components in linear mixed modeling using permutation.Multivariate Behavioral Research, 55,120136.CrossRefGoogle ScholarPubMed
Geweke, J. F., & Singleton, K. J.(1980). Interpreting the likelihood ratio statistic in factor models when sample size is small.Journal of the American Statistical Association, 75,133137.CrossRefGoogle Scholar
Hakstian, A. R., Rogers, W. T., & Cattell, R. B.(1982). The behavior of number-of-factor rules with simulated data.Multivariate Behavioral Research, 17,193219.CrossRefGoogle ScholarPubMed
Hayashi, K., Bentler, P. M., & Yuan, K-.(2007). On the likelihood ratio test for the number of factors in exploratory factor analysis.Structural Equation Modeling: A Multidisciplinary Journal, 14,505526.CrossRefGoogle Scholar
Lawley, D. N.(1956). A general method for approximating to the distribution of likelihood ratio criteria.Biometrika, 43,295303.CrossRefGoogle Scholar
Lehmann, E. L., Romano, J. P. Testing statistical hypotheses, (2006). New York, NY:Springer.Google Scholar
Liu, X., & Shao, Y.(2003). Asymptotics for likelihood ratio tests under loss of identifiability.The Annals of Statistics, 31,807832.CrossRefGoogle Scholar
Maydeu-Olivares, A., & Joe, H.Limited-and full-information estimation and goodness-of-fit testing in 2n\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$2^n$$\end{document} contingency tables: A unified framework.Journal of the American Statistical Association, (2005). 100,10091020.CrossRefGoogle Scholar
Maydeu-Olivares, A., & Joe, H.(2006). Limited information goodness-of-fit testing in multidimensional contingency tables.Psychometrika, 71,713732.CrossRefGoogle Scholar
Mitchell, J. D., Allman, E. S., & Rhodes, J. A.(2019). Hypothesis testing near singularities and boundaries.Electronic Journal of Statistics, 13,21502193.CrossRefGoogle ScholarPubMed
Reckase, M. Multidimensional item response theory, (2009). New York:Springer.CrossRefGoogle Scholar
Rotnitzky, A., Cox, D. R., Bottai, M., & Robins, J.(2000). Likelihood-based inference with singular information matrix.Bernoulli, 6,243284.CrossRefGoogle Scholar
Savalei, V., & Kolenikov, S.(2008). Constrained versus unconstrained estimation in structural equation modeling.Psychological Methods, 13,150170.CrossRefGoogle ScholarPubMed
Self, S. G., & Liang, K-.(1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions.Journal of the American Statistical Association, 82,605610.CrossRefGoogle Scholar
Shapiro, A.(1985). Asymptotic distribution of test statistics in the analysis of moment structures under inequality constraints.Biometrika, 72,133144.CrossRefGoogle Scholar
Shapiro, A.(1986). Asymptotic theory of overparameterized structural models.Journal of the American Statistical Association, 81,142149.CrossRefGoogle Scholar
Shi, D., Lee, T., & Terry, R. A.(2018). Revisiting the model size effect in structural equation modeling.Structural Equation Modeling: A Multidisciplinary Journal, 25,2140.CrossRefGoogle Scholar
Stram, D. O., & Lee, J. W.(1994). Variance components testing in the longitudinal mixed effects model.Biometrics, 50,11711177.CrossRefGoogle ScholarPubMed
Stram, D. O., & Lee, J. W.(1995). Correction to "variance components testing in the longitudinal mixed effects model".Biometrics, 51,1196.Google Scholar
Takane, Y., van der Heijden, P. GM., & Browne, M. W.(2003). On likelihood ratio tests for dimensionality selection. Higuchi, T., Iba, Y., & Ishiguro, M. Proceedings of science of modeling: The 30th anniversary meeting of the information criterion (AIC),Japan Institute of Statistical Mathematics:Tokyo 348349.Google Scholar
van der Vaart, A. W. Asymptotic statistics, (2000). Cambridge:Cambridge University Press.Google Scholar
Wald, A.(1943). Tests of statistical hypotheses concerning several parameters when the number of observations is large.Transactions of the American Mathematical Society, 54,426482.CrossRefGoogle Scholar
Wasserman, L., Ramdas, A., & Balakrishnan, S.(2020). Universal inference.Proceedings of the National Academy of Sciences, 117,1688016890.CrossRefGoogle ScholarPubMed
Wilks, S. S.(1938). The large-sample distribution of the likelihood ratio for testing composite hypotheses.The Annals of Mathematical Statistics, 9,6062.CrossRefGoogle Scholar
Wu, H., & Estabrook, R.(2016). Identification of confirmatory factor analysis models of different levels of invariance for ordered categorical outcomes.Psychometrika, 81,10141045.CrossRefGoogle ScholarPubMed
Wu, H., & Neale, M. C.(2013). On the likelihood ratio tests in bivariate acde models.Psychometrika, 78,441463.CrossRefGoogle ScholarPubMed
Yang, M., Jiang, G., & Yuan, K-.(2018). The performance of ten modified rescaled statistics as the number of variables increases.Structural Equation Modeling: A Multidisciplinary Journal, 25,414438.CrossRefGoogle Scholar
Figure 0

Table 1. Values of the true parameters for the simulations in Example 1.

Figure 1

Figure 1. a Results of Example 1(a). The black solid line shows the empirical CDF of the LRT statistic, based on 5000 independent simulations. The red dashed line shows the CDF of the χ2\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\chi ^2$$\end{document} distribution with 5 degrees of freedom as suggested by Wilks’ theorem. The blue dotted line shows the CDF of the reference distribution suggested by Theorem 2. b Results of Example 1(b). The black solid line shows the empirical CDF of the LRT statistic, and the red dashed line shows the CDF of the χ2\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\chi ^2$$\end{document} distribution with 9 degrees of freedom as suggested by Wilks’ theorem

Figure 2

Table 2. Values of the true parameters for the simulations in Example 2.

Figure 3

Figure 2. a Results of Example 2(a). The black solid line shows the empirical CDF of the LRT statistic, based on 5000 independent simulations. The red dashed line shows the CDF of the χ2\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\chi ^2$$\end{document} distribution with 5 degrees of freedom as suggested by Wilks’ theorem. The blue dotted line shows the CDF of the reference distribution suggested by Theorem 2. b Results of Example 2(b). The black solid line shows the empirical CDF of the LRT statistic, and the red dashed line shows the CDF of the χ2\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\chi ^2$$\end{document} distribution with 51 degrees of freedom as suggested by Wilks’ theorem

Figure 4

Figure 3. The black solid line shows the empirical CDF of the LRT statistic, based on 5000 independent simulations. The red dashed line shows the CDF of the χ2\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\chi ^2$$\end{document} distribution with one degree of freedom as suggested by Wilks’ theorem. The blue dotted line shows the CDF of the mixture of χ2\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\chi ^2$$\end{document} distribution suggested by Theorem 2 (Color figure online)

Supplementary material: File

Supplement to “A Note on Likelihood Ratio Tests for Models with Latent Variables”

Supplement to “A Note on Likelihood Ratio Tests for Models with Latent Variables”
Download Supplement to “A Note on Likelihood Ratio Tests for Models with Latent Variables”(File)
File 151.1 KB