Hostname: page-component-5f745c7db-rgzdr Total loading time: 0 Render date: 2025-01-06T06:18:21.681Z Has data issue: true hasContentIssue false

A Note on the Connection Between Trek Rules and Separable Nonlinear Least Squares in Linear Structural Equation Models

Published online by Cambridge University Press:  01 January 2025

Maximilian S. Ernst*
Affiliation:
Max Planck Institute for Human Development Humboldt-Universität Zu Berlin
Aaron Peikert
Affiliation:
Max Planck Institute for Human Development Humboldt-Universität Zu Berlin Max Planck UCL Centre for Computational Psychiatry and Ageing Research
Andreas M. Brandmaier
Affiliation:
Max Planck Institute for Human Development Max Planck UCL Centre for Computational Psychiatry and Ageing Research MSB Medical School Berlin
Yves Rosseel
Affiliation:
Ghent University
*
Correspondence should be made to Maximilian S. Ernst, Center for Lifespan Psychology, Max Planck Institute for Human Development, Lentzeallee 94, 14195 Berlin, Germany. Email: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

We show that separable nonlinear least squares (SNLLS) estimation is applicable to all linear structural equation models (SEMs) that can be specified in RAM notation. SNLLS is an estimation technique that has successfully been applied to a wide range of models, for example neural networks and dynamic systems, often leading to improvements in convergence and computation time. It is applicable to models of a special form, where a subset of parameters enters the objective linearly. Recently, Kreiberg et al. (Struct Equ Model Multidiscip J 28(5):725–739, 2021. https://doi.org/10.1080/10705511.2020.1835484) have shown that this is also the case for factor analysis models. We generalize this result to all linear SEMs. To that end, we show that undirected effects (variances and covariances) and mean parameters enter the objective linearly, and therefore, in the least squares estimation of structural equation models, only the directed effects have to be obtained iteratively. For model classes without unknown directed effects, SNLLS can be used to analytically compute least squares estimates. To provide deeper insight into the nature of this result, we employ trek rules that link graphical representations of structural equation models to their covariance parametrization. We further give an efficient expression for the gradient, which is crucial to make a fast implementation possible. Results from our simulation indicate that SNLLS leads to improved convergence rates and a reduced number of iterations.

Type
Theory and Methods
Creative Commons
Creative Common License - CCCreative Common License - BY
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Copyright
Copyright © 2022 The Author(s) under exclusive licence to The Psychometric Society

In the behavioral and social sciences, structural equation models (SEMs) have become widely accepted as a multivariate statistical tool for modeling the relation between latent and observed variables. Apart from maximum likelihood estimation, least squares (LS) estimation is a common approach for parameter estimation. In LS, parameters are estimated by minimizing a nonlinear function of the parameters and data. In practice, this problem is typically solved by applying generic nonlinear optimization techniques, such as Newton-type gradient descent approaches that iteratively minimize the objective function until convergence is reached. However, for some model classes, generic optimization algorithms can be adapted to make better use of the model structure and thus solve the problem more efficiently. For a particular type of models, the parameters separate, that is, one set of parameters enters the objective in a nonlinear way, while another set of parameters enters the objective linearly. For a vector of observations y and predictors x of size m, the objective is of the form

(1) i = 1 m y i - j = 1 n α j φ j ( β , x i ) 2 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \sum _{i = 1}^m \left[ y_i - \sum _{j=1}^n \alpha _j \varphi _j(\beta , x_i)\right] ^2 \end{aligned}$$\end{document}

where α R n , β R k \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\alpha \in \mathbb {R}^n, \beta \in \mathbb {R}^k$$\end{document} are parameter vectors and the (nonlinear) functions φ j \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varphi _j$$\end{document} are continuously differentiable w.r.t. β \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\beta $$\end{document} . Golub and Pereyra (Reference Golub and Pereyra1973) showed that this kind of objective allows for a reformulation of the optimization problem, such that only the parameters β \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\beta $$\end{document} have to be obtained iteratively, while the parameters α \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\alpha $$\end{document} can be computed after the optimization in a single step. The procedure has been subsequently called separable nonlinear least squares (SNLLS). It has been successfully applied in many disciplines, and it has been observed that the reduced dimension of the parameter space can lead to reduced computation time, a reduced number of iterations and better convergence properties (Golub & Pereyra, Reference Golub and Pereyra2003). Inspired by earlier work (Kreiberg et al., Reference Kreiberg, Söderström and Yang-Wallentin2016, Reference Kreiberg, Marcoulides and Olsson2021) that showed that this procedure can also be applied to factor analysis models, we generalize their result to the entire class of linear structural equation models and give analytical gradients for the reduced optimization problem, which is central for an efficient implementation.

1. Review of Concepts

In the following, we briefly review the notation for structural equation models, the generalized least squares estimator and the trek rules used to derive the model-implied covariance matrix.

1.1. Linear Structural Equation Models

Linear structural equation models can be defined in RAM notation (reticular action model; McArdle & McDonald, Reference McArdle and McDonald1984) as follows (we follow the notation from Drton, Reference Drton2018): Let x , ε \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$x, \varepsilon $$\end{document} be random vectors with values in R m \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mathbb {R}^m$$\end{document} and

(2) x = Λ x + ε \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} x = \varvec{\Lambda }x + \varepsilon \end{aligned}$$\end{document}

where Λ R m × m \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Lambda }\in \mathbb {R}^{m \times m}$$\end{document} is a matrix of constants or unknown (directed) parameters. Let Ω R m × m \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Omega }\in \mathbb {R}^{m \times m}$$\end{document} be the covariance matrix of ε \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varepsilon $$\end{document} and I \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{I}$$\end{document} the identity matrix. If I - Λ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{I}- \varvec{\Lambda }$$\end{document} is invertible, Eq. 2 can be solved by x = ( I - Λ ) - 1 ε \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$x = (\textbf{I}-\varvec{\Lambda })^{-1}\varepsilon $$\end{document} with covariance matrix

(3) V [ x ] = I - Λ - 1 Ω I - Λ - T \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \mathbb {V}[x] = \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} \varvec{\Omega }\left( \textbf{I}-\varvec{\Lambda }\right) ^{-T} \end{aligned}$$\end{document}

If x is partitioned into a part x obs \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$x_{\text {obs}}$$\end{document} of m obs \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$m_\textrm{obs}$$\end{document} observed variables and x lat \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$x_{\text {lat}}$$\end{document} of m lat \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$m_\textrm{lat}$$\end{document} latent variables, we can reorder x such that x = x obs T x lat T T \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$x = \left( x_{\text {obs}}^T \; x_{\text {lat}}^T\right) ^T$$\end{document} , and the covariance matrix of the observed variables is given by

(4) Σ : = V [ x obs ] = F ( I - Λ ) - 1 Ω ( I - Λ ) - T F T \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \varvec{\Sigma }:=\mathbb {V}[x_{\text {obs}}] = \textbf{F}(\textbf{I}-\varvec{\Lambda })^{-1}\varvec{\Omega }(\textbf{I}-\varvec{\Lambda })^{-T}\textbf{F}^T\end{aligned}$$\end{document}

where F = I | 0 R m obs × ( m obs + m lat ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{F}= \left[ {\textbf {I}}\,\big |\,{\textbf {0}}\right] \in \mathbb {R}^{m_\textrm{obs} \times (m_\textrm{obs} +m_\textrm{lat})}$$\end{document} is a rectangular filter matrix. We denote the parameters by θ = θ Λ T Ω Λ T T R q \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta = \left( {\theta _{\Lambda }^{T} \Omega _{\Lambda }^{T}}\right) ^{T}\in \mathbb {R}^{q}$$\end{document} , partitioned into directed parameters from Λ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Lambda }$$\end{document} and undirected parameters from Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Omega }$$\end{document} . (We call them directed or undirected parameters because they correspond to directed or undirected paths in the graph of the model.) If we want to stress that Σ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Sigma }$$\end{document} is a function of the parameters, we write Σ ( θ ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Sigma }(\theta )$$\end{document} . If we are also interested in the mean structure, we introduce a vector of (possibly zero) mean parameters γ R m \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\gamma \in \mathbb {R}^m$$\end{document} such that x = γ + Λ x + ε \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$x = \gamma + \varvec{\Lambda }x + \varepsilon $$\end{document} and obtain

(5) μ : = E [ x obs ] = F ( I - Λ ) - 1 γ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \mu :=\mathbb {E}[x_{obs}] = \textbf{F}(\textbf{I}- \varvec{\Lambda })^{-1}\gamma \end{aligned}$$\end{document}

1.2. Least Squares Estimation

The least squares objective function for θ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta $$\end{document} is:

(6) F LS = ( s - σ ) T V ( s - σ ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} F_{\text {LS}} = (s - \sigma )^T \textbf{V}(s - \sigma ) \end{aligned}$$\end{document}

where σ = vech ( Σ ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\sigma = {{\,\textrm{vech}\,}}(\varvec{\Sigma })$$\end{document} is the half-vectorization of Σ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Sigma }$$\end{document} , that is, the vector of non-duplicated elements of the model-implied covariance matrix, s = vech ( S ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$s = {{\,\textrm{vech}\,}}(\textbf{S})$$\end{document} is the half-vectorization of the observed covariance matrix and V \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{V}$$\end{document} is a fixed symmetric positive definite weight matrix. Specific forms of the weight matrix V \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{V}$$\end{document} lead to commonly used special cases of this estimation technique: Generalized least squares estimation uses V = 1 2 D T S - 1 S - 1 D \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{V}= \frac{1}{2} \, \textbf{D}^T\left( \textbf{S}^{-1} \otimes \textbf{S}^{-1}\right) \textbf{D}$$\end{document} (where D \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{D}$$\end{document} denotes the duplication matrix from Magnus and Neudecker (Reference Magnus and Neudecker2019b)), asymptotic distribution-free estimation uses a consistent estimator of the asymptotic covariance matrix of s, and unweighted least squares estimation uses the identity matrix (Bollen, Reference Bollen1989; Browne, Reference Browne and Hawkins1982, Reference Browne1984).

1.3. Trek Rules

To show that in SEM undirected effects enter the least squares objective linearly, we employ trek rules (Drton, Reference Drton2018), which are path tracing rules used to derive the model-implied covariance between any pair of variables in a SEM (Boker et al., Reference Boker, McArdle and Neale2002). Various authors have proposed rules to link the graph to the covariance parametrization of the model. Here, we give the rules as put forward by Drton (Reference Drton2018), which are based on treks as basic building blocks (for an overview of alternative formulations see Mulaik, Reference Mulaik2009). A trek τ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tau $$\end{document} from a node i to j is a path connecting them, where directed edges can be traveled forwards and backwards, but it is not allowed to walk from one arrowhead into another (without colliding arrowheads). A top node of a trek is a node which has only outgoing edges.

To derive an expression for the model-implied covariance between any two variables i and j based on the postulated SEM, we follow 4 steps:

  1. Find all treks from i to j.

  2. For each trek, multiply all parameters along it.

  3. If a trek does not contain a covariance parameter, factor in the variance of the top node.

  4. Add all obtained trek monomials from the different treks together.

Note that a trek is ordered in the sense that two treks containing exactly the same nodes and edges are considered different if they are traveled in a different order. In particular, each trek has a source (i) and a target (j), and a trek from j to i is considered to be a different trek, even if it contains exactly the same nodes and edges. Also note that variances are not considered to be edges in the mixed graph corresponding to the model (i.e., it is not allowed to travel variance edges). Therefore, all graphical representations of SEMs in this article omit variance edges, and it is required to factor them in according to rule 3 after the treks are collected.

1.3.1. Example

To illustrate how the model-implied covariances can be derived using trek rules, we give an example based on the graph shown in the path diagram in Fig. 1. To find the model-implied covariance between nodes X 2 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_2$$\end{document} and X 6 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_6$$\end{document} in the model shown in Fig. 1, we first find all treks from X 2 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_2$$\end{document} to X 6 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_6$$\end{document}

(7)
(8)

We now compute the trek monomials for each trek. The second trek does not contain a covariance parameter, so we need to factor in the variance of the top node. We find the trek’s top node G and denote the variance parameter of G by ω G \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\omega _G$$\end{document} . Finally, we add the resulting trek monomials and we find that the model-implied covariance between X 2 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_2$$\end{document} and X 6 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_6$$\end{document} can be expressed as follows:

(9) cov ( X 2 , X 6 ) = λ 2 ω l λ 6 + β 2 ω G β 6 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} {{\,\textrm{cov}\,}}(X_2, X_6) = \lambda _2\omega _l\lambda _6 + \beta _2\omega _G\beta _6 \end{aligned}$$\end{document}

As a second example, we derive the model-implied variance of X 3 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_3$$\end{document} . Again, we first find all treks from X 3 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_3$$\end{document} to X 3 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_3$$\end{document} :

(10)
(11)
(12) X 3 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} X_3 \end{aligned}$$\end{document}

All treks do not contain a covariance parameter, so we need to factor in the variance of the respective top nodes ζ 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\zeta _1$$\end{document} , G and X 3 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_3$$\end{document} . We denote the variance parameters of ζ 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\zeta _1$$\end{document} and X 3 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_3$$\end{document} by ω ζ 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\omega _{\zeta _1}$$\end{document} and ω 3 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\omega _3$$\end{document} and add the resulting trek monomials to obtain

(13) var ( X 3 ) = λ 3 2 ω ζ 1 + β 3 2 ω G + ω 3 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} {{\,\textrm{var}\,}}(X_3) = \lambda _3^2\omega _{\zeta _1} + \beta _3^2\omega _G + \omega _3 \end{aligned}$$\end{document}

Figure 1. Graph of a bi-factor model with one general factor and two specific factors. Circles represent latent variables, and rectangles represent observed variables. Variances are omitted in this representation.

1.3.2. Formal Definitions

We denote the elements of Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Omega }$$\end{document} , the undirected effects between nodes i and j, by ω ij \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\omega _{ij}$$\end{document} and the elements of Λ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Lambda }$$\end{document} , the directed effects, by λ ij \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{ij}$$\end{document} . Drton (Reference Drton2018) defines a trek monomial of a trek τ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tau $$\end{document} without a covariance parameter as

(14) τ ( Λ , Ω ) = ω i 0 i 0 k l τ λ lk \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \tau (\varvec{\Lambda }, \varvec{\Omega }) = \omega _{i_0 i_0} \prod _{k \rightarrow l \in \tau } \lambda _{lk} \end{aligned}$$\end{document}

where i 0 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i_0$$\end{document} is the top node of the trek, and a trek monomial of a trek τ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tau $$\end{document} containing an undirected edge between i 0 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i_0$$\end{document} and j 0 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$j_0$$\end{document} as

(15) τ ( Λ , Ω ) = ω i 0 j 0 k l τ λ lk \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \tau (\varvec{\Lambda }, \varvec{\Omega }) = \omega _{i_0 j_0} \prod _{k \rightarrow l \in \tau } \lambda _{lk} \end{aligned}$$\end{document}

(notice the swapped indices of λ lk \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{lk}$$\end{document} compared to the formula in Drton because our Λ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Lambda }$$\end{document} corresponds to his Λ T \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Lambda }^T$$\end{document} ). With this, the elements of Σ ( θ ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Sigma }(\theta )$$\end{document} are represented as a summation over treks. He proves that

(16) Σ ( θ ) ij = τ T ( i , j ) τ ( Λ , Ω ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \varvec{\Sigma }(\theta )_{ij} = \sum _{\tau \in {\mathcal {T}}(i,j)} \tau (\varvec{\Lambda }, \varvec{\Omega }) \end{aligned}$$\end{document}

where T ( i , j ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\mathcal {T}}(i,j)$$\end{document} is the set of all treks from i to j. It follows that the model-implied covariance is a sum of monomials of parameters. Because covariances between the error terms are not transitive, exactly one undirected parameter (variance or covariance) is present in each monomial. Therefore, if all the directed parameters were fixed, the model-implied covariance would be a linear function of the undirected parameters. This is what makes the SNLLS procedure applicable to structural equation models.

For later use, we also note that Drton gives the following expression:

(17) ( I - Λ ) ij - 1 = τ P ( j , i ) k l τ λ lk \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} (\textbf{I}- \varvec{\Lambda })^{-1}_{ij} = \sum _{\tau \in {\mathcal {P}}(j,i)} \prod _{k \rightarrow l \in \tau } \lambda _{lk} \end{aligned}$$\end{document}

where P ( j , i ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\mathcal {P}}(j,i)$$\end{document} is the set of directed paths from j to i. This is because we can write ( I - Λ ) - 1 = k = 0 Λ k \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(\textbf{I}- \varvec{\Lambda })^{-1} = \sum _{k = 0}^\infty \varvec{\Lambda }^k$$\end{document} , where the geometric series converges iff all eigenvalues of Λ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Lambda }$$\end{document} lie in ( - 1 , 1 ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(-1, 1)$$\end{document} . (Further explanations about this and an excellent account of the connections between matrix algebra and graphs can be found in Kepner & Gilbert (Reference Kepner and Gilbert2011).)

2. Separable Nonlinear Least Squares for SEM

We first outline the proofs for the applicability of SNLLS to CFA as given by Golub and Pereyra (Reference Golub and Pereyra1973) and Kreiberg et al. (Reference Kreiberg, Marcoulides and Olsson2021). Subsequently, we proof that SNLLS is applicable to linear structural equation models. We further extend the existing proofs to subsume models that contain a mean structure. Last, we derive analytic gradients that are central for efficient software implementations.

2.1. Outline of Previous Work

To minimize Eq. 1, Golub and Pereyra (Reference Golub and Pereyra1973) define the matrix function

(18) Φ ij : = φ j ( β , x i ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \Phi _{ij} :=\varphi _j(\beta , x_i) \end{aligned}$$\end{document}

such that Eq. 1 can be written as

(19) y - Φ ( β ) α 2 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \Vert y - \Phi (\beta ) \alpha \Vert ^2 \end{aligned}$$\end{document}

where · \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Vert \cdot \Vert $$\end{document} denotes the euclidean norm. For a fixed value of β \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\beta $$\end{document} , a solution for α \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\alpha $$\end{document} can be obtained as α = Φ + ( β ) y \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\alpha = \Phi ^+(\beta ) y$$\end{document} . They further proved that under the assumption that Φ ( β ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Phi (\beta )$$\end{document} has constant rank near the solution, only the nonlinear parameters β \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\beta $$\end{document} have to be obtained iteratively by replacing α \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\alpha $$\end{document} and minimizing the modified objective

(20) y - Φ ( β ) Φ + ( β ) y 2 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \left\Vert y - \Phi (\beta )\Phi ^+(\beta ) y \right\Vert ^2 \end{aligned}$$\end{document}

where Φ + \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Phi ^+$$\end{document} denotes the Moore–Penrose generalized inverse. Afterward, the least squares solution for the linear parameters α \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\alpha $$\end{document} can be obtained as the standard least squares estimator arg min α R n Φ ( β ^ ) α - y = Φ + ( β ^ ) y \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${{\,\mathrm{arg\,min}\,}}_{\alpha \in \mathbb {R}^n} \Vert \Phi (\hat{\beta })\alpha - y \Vert = \Phi ^+(\hat{\beta }) y$$\end{document} .

Kreiberg et al. (Reference Kreiberg, Marcoulides and Olsson2021) showed that this procedure is applicable for CFA models (we reproduce their main results in our notation), as it is possible to rewrite the model-implied covariances σ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\sigma $$\end{document} as a product of a matrix-valued function G ( θ Λ ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{G}(\theta _{\varvec{\Lambda }})$$\end{document} (that depends only on the directed parameters) and the undirected parameters θ Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _{\varvec{\Omega }}$$\end{document} , so the LS objective can be written as

(21) F LS = ( s - σ ) T V ( s - σ ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} F_{\text {LS}}&= (s - \sigma )^T \textbf{V}(s - \sigma ) \end{aligned}$$\end{document}
(22) = s - G ( θ Λ ) θ Ω T V s - G ( θ Λ ) θ Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned}&= \left( s - \textbf{G}(\theta _{\varvec{\Lambda }})\theta _{\varvec{\Omega }}\right) ^T \textbf{V}\left( s - \textbf{G}(\theta _{\varvec{\Lambda }})\theta _{\varvec{\Omega }} \right) \end{aligned}$$\end{document}
(23) = s - G ( θ Λ ) θ Ω V 2 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned}&= \left\Vert s - \textbf{G}(\theta _{\varvec{\Lambda }})\theta _{\varvec{\Omega }}\right\Vert ^2_\textbf{V}\end{aligned}$$\end{document}

They further stated that if θ Λ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _{\varvec{\Lambda }}$$\end{document} is fixed, we know from standard linear least squares estimation that the minimizer for the undirected effects can be obtained as

(24) θ ^ Ω = G ( θ Λ ) T V G ( θ Λ ) - 1 G ( θ Λ ) T V s \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \hat{\theta }_{\varvec{\Omega }} = \left( \textbf{G}( \theta _{\varvec{\Lambda }})^T \textbf{V}\textbf{G}( \theta _{\varvec{\Lambda }})\right) ^{-1} \textbf{G}( \theta _{\varvec{\Lambda }})^T\textbf{V}s \end{aligned}$$\end{document}

Inserting Eq. 24 into Eq. 22 and simplifying, they obtained a new objective to be minimized:

(25) θ ^ Λ = arg min θ Λ s T V s - s T V G ( θ Λ ) G ( θ Λ ) T V G ( θ Λ ) - 1 G ( θ Λ ) T V s \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \hat{\theta }_{\varvec{\Lambda }}&= \mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{\theta _{\varvec{\Lambda }}} \left[ s^T\textbf{V}s - s^T\textbf{V}\textbf{G}(\theta _{\varvec{\Lambda }}) \left( \textbf{G}(\theta _{\varvec{\Lambda }})^T \textbf{V}\textbf{G}(\theta _{\varvec{\Lambda }})\right) ^{-1} \textbf{G}(\theta _{\varvec{\Lambda }})^T\textbf{V}s \right] \end{aligned}$$\end{document}
(26) = arg min θ Λ F SNLLS ( θ Λ ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned}&= \mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{\theta _{\varvec{\Lambda }}} F_{\text {SNLLS}}(\theta _{\varvec{\Lambda }}) \end{aligned}$$\end{document}

This objective only depends on the directed parameters θ Λ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _{\varvec{\Lambda }}$$\end{document} . After minimizing it to obtain a LS estimate θ ^ Λ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\hat{\theta }_{\varvec{\Lambda }}$$\end{document} , Eq. 24 can be used to obtain the LS estimate of θ Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _{\varvec{\Omega }}$$\end{document} . We would like to note they assume that G \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{G}$$\end{document} has full rank, which is not a necessary assumption and can be relaxed using alternative formulations of Eqs. 24 and 25. To extend the method to general structural equation models, we have to derive G ( θ Λ ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{G}(\theta _{\varvec{\Lambda }})$$\end{document} . We do that in the following for all models formulated in the RAM notation.

2.2. Derivation of G ( θ Λ ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{G}(\theta _{\varvec{\Lambda }})$$\end{document}

Since F = I | 0 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{F}= \left[ {\textbf {I}}\,\big |\,{\textbf {0}}\right] $$\end{document} with 0 R m obs × m lat \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{0} \in \mathbb {R}^{m_\textrm{obs} \times m_\textrm{lat}}$$\end{document} , the product F M F T \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{F}\textbf{M}\textbf{F}^{T}$$\end{document} for any M R m × m \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{M}\in \mathbb {R}^{m \times m}$$\end{document} is equal to just deleting the last m lat \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$m_\textrm{lat}$$\end{document} rows and columns of M \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{M}$$\end{document} . We also note that for any matrices M , D R n × n \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{M}, \textbf{D}\in \mathbb {R}^{n \times n}$$\end{document} we can write

(27) M D M T ij = k = 1 n l = 1 n m il d lk m jk \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \left( \textbf{M}\textbf{D}\textbf{M}^{T}\right) _{ij} = \sum _{k = 1}^n \sum _{l = 1}^n m_{il} \; d_{lk} \; m_{jk} \end{aligned}$$\end{document}

With this in mind, we can rewrite the model-implied covariance matrix Σ ( θ ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Sigma }(\theta )$$\end{document} as

(28) Σ ( θ ) ij = ( F ( I - Λ ) - 1 Ω ( I - Λ ) - T F T ) ij \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \varvec{\Sigma }(\theta )_{ij}&= \bigg ( \textbf{F}{} & {} (\textbf{I}-\varvec{\Lambda })^{-1}{} & {} \varvec{\Omega }&(\textbf{I}-\varvec{\Lambda })^{-T}{} & {} \textbf{F}^T \bigg )_{ij} \end{aligned}$$\end{document}
(29) = ( ( I - Λ ) - 1 Ω ( I - Λ ) - T ) ij \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned}&= \bigg ({} & {} (\textbf{I}-\varvec{\Lambda })^{-1}{} & {} \varvec{\Omega }&(\textbf{I}-\varvec{\Lambda })^{-T}{} & {} \bigg )_{ij} \end{aligned}$$\end{document}
(30) = k = 1 m l = 1 m ( I - Λ ) il - 1 ω lk ( I - Λ ) jk - 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned}&= \; \; \sum _{k = 1}^m \sum _{l = 1}^m{} & {} (\textbf{I}- \varvec{\Lambda })^{-1}_{il}{} & {} \omega _{lk}&(\textbf{I}- \varvec{\Lambda })^{-1}_{jk}{} & {} \end{aligned}$$\end{document}
(31) = k = 1 m l = 1 m ( τ P ( l , i ) r s τ λ sr ) ω lk ( τ P ( k , j ) r s τ λ sr ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned}&= \; \; \sum _{k = 1}^m \sum _{l = 1}^m{} & {} \Big (\sum _{\tau \in \mathcal {P}(l,i)} \prod _{r \rightarrow s \in \tau } \lambda _{sr}\Big ){} & {} \omega _{lk}&\Big (\sum _{\tau \in \mathcal {P}(k,j)} \prod _{r \rightarrow s \in \tau } \lambda _{sr}\Big ){} & {} \end{aligned}$$\end{document}

with i , j { 1 , , m obs } \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i,j \in \{1, \ldots , m_\textrm{obs}\}$$\end{document} . We now immediately see that each entry of Σ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Sigma }$$\end{document} is a sum of products of entries of ( I - Λ ) - 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(\textbf{I}-\varvec{\Lambda })^{-1}$$\end{document} and Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Omega }$$\end{document} . More importantly, exactly one entry of Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Omega }$$\end{document} enters each term of the sum; if we keep all entries of Λ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Lambda }$$\end{document} fixed, each element in Σ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Sigma }$$\end{document} is a linear function of the entries of Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Omega }$$\end{document} and is therefore a linear function of the undirected parameters in Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Omega }$$\end{document} (under the assumption that Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Omega }$$\end{document} is linearly parameterized). As a result, the parameter vector θ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta $$\end{document} is separable in two parts, θ Λ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _{\varvec{\Lambda }}$$\end{document} from Λ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Lambda }$$\end{document} and θ Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _{\varvec{\Omega }}$$\end{document} from Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Omega }$$\end{document} , and θ Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _{\varvec{\Omega }}$$\end{document} enters the computation of the model-implied covariance linearly. As stated before, this is the reason why we will be able to apply separable nonlinear least squares estimation to our problem. Before we proceed, we would like to introduce some notation. If F \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\mathcal {F}}$$\end{document} and G \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\mathcal {G}}$$\end{document} are tuples of length n and m, and f and g are functions, we define a column vector of length n as

(32) ( [ f ( i ) ] i F ) = f ( F 1 ) f ( F 2 ) f ( F n ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \Bigg (\Big [f(i)\Big ]_{i \in {\mathcal {F}}}\Bigg ) = \begin{pmatrix} f({\mathcal {F}}_1) \\ f({\mathcal {F}}_2) \\ \ldots \\ f({\mathcal {F}}_n) \end{pmatrix} \end{aligned}$$\end{document}

and a matrix of size n × m \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n \times m$$\end{document} as

(33) ( [ g ( i , j ) ] i F , j G ) = g ( F 1 , G 1 ) g ( F 1 , G 2 ) g ( F 1 , G m ) g ( F 2 , G 1 ) g ( F 2 , G 2 ) g ( F 2 , G m ) g ( F n , G 1 ) g ( F n , G m ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \Bigg (\Big [g(i,j)\Big ]_{i \in {\mathcal {F}},\; j \in {\mathcal {G}}}\Bigg ) = \begin{pmatrix} g({\mathcal {F}}_1, {\mathcal {G}}_1) &{}\quad g({\mathcal {F}}_1, {\mathcal {G}}_2) &{}\quad \ldots &{}\quad g({\mathcal {F}}_1, {\mathcal {G}}_m)\\ g({\mathcal {F}}_2, {\mathcal {G}}_1) &{}\quad g({\mathcal {F}}_2, {\mathcal {G}}_2) &{}\quad \ldots &{}\quad g({\mathcal {F}}_2, {\mathcal {G}}_m)\\ \ldots &{}\quad \ldots &{}\quad \ldots &{}\quad \ldots \\ g({\mathcal {F}}_n, {\mathcal {G}}_1) &{}\quad \ldots &{}\quad \ldots &{}\quad g({\mathcal {F}}_n, {\mathcal {G}}_m) \end{pmatrix} \end{aligned}$$\end{document}

To make the subsequent steps easier to follow, we assume that there are no equality constraints between parameters in Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Omega }$$\end{document} and no constant terms in Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Omega }$$\end{document} different from 0. In Appendices A and B, we show how to lift those assumptions. We now further simplify Eq. 30: Since only nonzero entries of Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Omega }$$\end{document} (the parameters θ Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _{\varvec{\Omega }}$$\end{document} ) contribute to the sum, we define C \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\mathcal {C}}$$\end{document} as the lower triangular indices of θ Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _{\varvec{\Omega }}$$\end{document} in Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Omega }$$\end{document} , i.e., C i = ( l , k ) N × N \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\mathcal {C}}_i = (l, k) \in \mathbb {N}\times \mathbb {N}$$\end{document} with ( θ Ω ) i = ω lk \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$({\theta _{\varvec{\Omega }}})_i = \omega _{lk}$$\end{document} and l k \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l \ge k$$\end{document} . We now rewrite Eq. 30 by omitting all zero terms:

(34) Σ ( θ ) ij = ( l , k ) C ( I - Λ ) il - 1 ω lk ( I - Λ ) jk - 1 + δ k l ( I - Λ ) ik - 1 ω lk ( I - Λ ) jl - 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \varvec{\Sigma }(\theta )_{ij}&= \sum _{(l,k) \in {\mathcal {C}}} \left[ (\textbf{I}- \varvec{\Lambda })^{-1}_{il} \omega _{lk} (\textbf{I}- \varvec{\Lambda })^{-1}_{jk} + \delta _{k \ne l} \, (\textbf{I}- \varvec{\Lambda })^{-1}_{ik} \omega _{lk} (\textbf{I}- \varvec{\Lambda })^{-1}_{jl} \right] \end{aligned}$$\end{document}
(35) = ( I - Λ ) il - 1 ( I - Λ ) jk - 1 + δ k l ( I - Λ ) ik - 1 ( I - Λ ) jl - 1 ( l , k ) C T θ Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned}&= \left( \left[ (\textbf{I}- \varvec{\Lambda })^{-1}_{il} (\textbf{I}- \varvec{\Lambda })^{-1}_{jk} + \delta _{k \ne l} \, (\textbf{I}- \varvec{\Lambda })^{-1}_{ik} (\textbf{I}- \varvec{\Lambda })^{-1}_{jl}\right] _{(l,k) \in {\mathcal {C}}} \right) ^T \; \theta _{\varvec{\Omega }} \end{aligned}$$\end{document}

where δ k l \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\delta _{k \ne l}$$\end{document} is an indicator function that takes the value 1 if k l \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$k \ne l$$\end{document} and 0 otherwise. Since we are only interested in the non-duplicated elements σ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\sigma $$\end{document} of Σ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Sigma }$$\end{document} , we define another index tuple D \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\mathcal {D}}$$\end{document} that denotes the indices of the original position of σ k \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\sigma _k$$\end{document} in Σ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Sigma }$$\end{document} , i.e., D k = ( i , j ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\mathcal {D}}_k = (i, j)$$\end{document} such that σ k = Σ ij \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\sigma _k = \varvec{\Sigma }_{ij}$$\end{document} . This allows us to stack the expression we just found for Σ ij \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Sigma }_{ij}$$\end{document} rowwise to get

(36) σ = [ Σ ij ] ( i , j ) D \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \sigma&= \left( \Big [\varvec{\Sigma }_{ij} \Big ]_{(i, j) \in {\mathcal {D}}} \right) \end{aligned}$$\end{document}
(37) = ( I - Λ ) il - 1 ( I - Λ ) jk - 1 + δ k l ( I - Λ ) ik - 1 ( I - Λ ) jl - 1 ( i , j ) D , ( l , k ) C θ Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned}&= \left( \left[ (\textbf{I}- \varvec{\Lambda })^{-1}_{il} (\textbf{I}- \varvec{\Lambda })^{-1}_{jk} + \delta _{k \ne l} \, (\textbf{I}- \varvec{\Lambda })^{-1}_{ik} (\textbf{I}- \varvec{\Lambda })^{-1}_{jl}\right] _{(i, j) \in {\mathcal {D}}, \; (l,k) \in {\mathcal {C}}} \; \right) \theta _{\varvec{\Omega }} \end{aligned}$$\end{document}
(38) = G ( θ Λ ) θ Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned}&= \textbf{G}(\theta _{\varvec{\Lambda }}) \; \theta _{\varvec{\Omega }} \end{aligned}$$\end{document}

where G ( θ Λ ) R dim ( σ ) × dim ( θ Ω ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{G}(\theta _{\varvec{\Lambda }}) \in \mathbb {R}^{\dim (\sigma ) \times \dim (\theta _{\varvec{\Omega }})}$$\end{document} . (We let dim ( · ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\dim (\cdot )$$\end{document} of a vector denote its number of elements, i.e., the dimension of the underlying (finite-dimensional) vector space.)

Even though this expression may appear involved, it is in fact easy to compute. Before the optimization procedure starts, we store C \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\mathcal {C}}$$\end{document} by looking up the positions of the parameters in Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Omega }$$\end{document} and also store D \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\mathcal {D}}$$\end{document} . At each step of the optimization procedure, to compute G ( θ Λ ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{G}(\theta _{\varvec{\Lambda }})$$\end{document} , we now compute ( I - Λ ) - 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(\textbf{I}- \varvec{\Lambda })^{-1}$$\end{document} first and then loop through the entries C \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\mathcal {C}}$$\end{document} and D \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\mathcal {D}}$$\end{document} to compute each entry of G ( θ Λ ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{G}(\theta _{\varvec{\Lambda }})$$\end{document} according to Eq. 37. We note that G \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{G}$$\end{document} will typically be sparse; therefore, it is advisable to analyze its sparsity pattern previous to the optimization, and only loop through nonzero values.

In Appendix D, we present a different way of obtaining G ( θ Λ ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{G}(\theta _{\varvec{\Lambda }})$$\end{document} and the gradients, which mimics the approach of Kreiberg et al. (Reference Kreiberg, Marcoulides and Olsson2021). However, the expressions obtained here are computationally more efficient, as the ones in the appendix contain very large Kronecker products.

2.3. Mean Structures

If the model contains mean parameters, we partition the parameter vector θ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta $$\end{document} into three parts: θ Λ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _{\varvec{\Lambda }}$$\end{document} and θ Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _{\varvec{\Omega }}$$\end{document} as before, and θ γ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _\gamma $$\end{document} from the mean vector γ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\gamma $$\end{document} . From Eq. 5, we directly see that the model-implied mean vector μ ( θ ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mu (\theta )$$\end{document} is a linear function of θ γ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _\gamma $$\end{document} . If we let A \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\mathcal {A}}$$\end{document} denote the indices of the parameters θ γ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _{\gamma }$$\end{document} in γ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\gamma $$\end{document} , i.e., for i = A j \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i = {\mathcal {A}}_{j}$$\end{document} we have ( θ γ ) j \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$({\theta _{\gamma }})_j$$\end{document} = γ i \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\gamma _i$$\end{document} , we obtain the formula

(39) μ = ( I - Λ ) ij - 1 i ( 1 , , m obs ) , j A θ γ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \mu = \left( \left[ (\textbf{I}- \varvec{\Lambda })^{-1}_{ij}\right] _{i \in (1, \ldots , m_\textrm{obs}), \; j \in {\mathcal {A}}} \right) \; \theta _{\gamma }\end{aligned}$$\end{document}

We now make a slight change in notation: For the previously obtained G ( θ ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{G}(\theta )$$\end{document} -matrix, we write G σ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{G}_\sigma $$\end{document} instead and define G μ : = ( I - Λ ) ij - 1 i ( 1 , , m obs ) , j A \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{G}_\mu :=\left( \left[ (\textbf{I}- \varvec{\Lambda })^{-1}_{ij}\right] _{i \in (1, \ldots , m_\textrm{obs}), \; j \in {\mathcal {A}}} \right) $$\end{document} . Using a formulation of the least squares objective that also includes a mean structure, we see that

(40)

with

(41) G : = G σ 0 0 G μ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \textbf{G}:=\left[ \begin{array}{cc} \textbf{G}_\sigma &{}\quad \textbf{0} \\ \textbf{0} &{}\quad \textbf{G}_\mu \end{array}\right] \end{aligned}$$\end{document}

It follows that in addition to the undirected parameters, the mean parameters also do not have to be optimized iteratively but can instead be computed analytically after the iterative optimization is completed.

2.4. Gradient of the SNLLS Objective

There are computationally efficient expression to compute the SNLLS objective and its gradient analytically (Kaufman, Reference Kaufman1975; O’Leary & Rust, Reference O’Leary and Rust2013). Because numerical approximations of the gradient are often slow and may become numerically unstable, we derive an analytical expression for the part of the gradient that is specific to SEMs. We use the notation and methods from Magnus and Neudecker (Reference Magnus and Neudecker2019a) and denote the differential by d \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${{\,\mathrm{\textsf{d}}\,}}$$\end{document} and the Jacobian by D \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${{\,\mathrm{\textsf{D}}\,}}$$\end{document} . The Jacobian of a matrix function M \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{M}$$\end{document} with respect to a vector x is defined as D M = vec M x T \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${{\,\mathrm{\textsf{D}}\,}}\textbf{M}= \frac{\partial {{\,\textrm{vec}\,}}\textbf{M}}{\partial x^{T}}$$\end{document} . In the approaches by Kaufman (Reference Kaufman1975) and O’Leary and Rust (Reference O’Leary and Rust2013), the gradient of the SNLSS objective is expressed in terms of the partial derivatives of the entries of G \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{G}$$\end{document} w.r.t the nonlinear parameters, i.e., D G \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${{\,\mathrm{\textsf{D}}\,}}\textbf{G}$$\end{document} . In order to be able to implement such efficient approaches in practice, we derive D G \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${{\,\mathrm{\textsf{D}}\,}}\textbf{G}$$\end{document} here. We also give the full gradient of Eq. 25 for completeness in Appendix C, although in practice, a more efficient expression from the cited literature can be used (which also does not assume G \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{G}$$\end{document} to have full rank). For reasons of clarity, we here only consider the case without mean structure, e.g., G = G σ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{G}= \textbf{G}_\sigma $$\end{document} . This is because the derivative of G μ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{G}_\mu $$\end{document} is similar to obtain and we do not want to make the derivation unnecessarily technical.

Let E \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\mathcal {E}}$$\end{document} denote the indices of θ Λ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _{\varvec{\Lambda }}$$\end{document} in Λ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Lambda }$$\end{document} , i.e., E k = ( i , j ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\mathcal {E}}_k = (i,j)$$\end{document} such that Λ ij = ( θ Λ ) k \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Lambda }_{ij} = ({\theta _{\varvec{\Lambda }}})_k$$\end{document} . We note that

(42) ( I - Λ ) kl - 1 Λ ij = ( I - Λ ) ki - 1 ( I - Λ ) jl - 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \frac{\partial (\textbf{I}- \varvec{\Lambda })^{-1}_{kl}}{\partial \varvec{\Lambda }_{ij}} = (\textbf{I}- \varvec{\Lambda })^{-1}_{ki}(\textbf{I}- \varvec{\Lambda })^{-1}_{jl} \end{aligned}$$\end{document}

With this, we derive the partial derivatives of each entry of G \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{G}$$\end{document} in terms of the matrix ( I - Λ ) - 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(\textbf{I}- \varvec{\Lambda })^{-1}$$\end{document} as

(43) G r , s ( θ Λ ) n = ( θ Λ ) n ( I - Λ ) il - 1 ( I - Λ ) jk - 1 + δ k l ( I - Λ ) ik - 1 ( I - Λ ) jl - 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \frac{\partial \textbf{G}_{r, s}}{\partial ({\theta _{\varvec{\Lambda }}})_n}&= \frac{\partial }{\partial ({\theta _{\varvec{\Lambda }}})_n} \left[ (\textbf{I}- \varvec{\Lambda })^{-1}_{il} (\textbf{I}- \varvec{\Lambda })^{-1}_{jk} +\delta _{k \ne l} \, (\textbf{I}- \varvec{\Lambda })^{-1}_{ik} (\textbf{I}- \varvec{\Lambda })^{-1}_{jl}\right] \end{aligned}$$\end{document}
(44) = ( θ Λ ) n ( I - Λ ) il - 1 ( I - Λ ) jk - 1 + ( I - Λ ) il - 1 ( θ Λ ) n ( I - Λ ) jk - 1 + δ k l ( θ Λ ) n ( I - Λ ) ik - 1 ( I - Λ ) jl - 1 + ( I - Λ ) ik - 1 ( θ Λ ) n ( I - Λ ) jl - 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned}&= \left[ \frac{\partial }{\partial ({\theta _{\varvec{\Lambda }}})_n}(\textbf{I}- \varvec{\Lambda })^{-1}_{il} (\textbf{I}- \varvec{\Lambda })^{-1}_{jk} + (\textbf{I}- \varvec{\Lambda })^{-1}_{il} \frac{\partial }{\partial ({\theta _{\varvec{\Lambda }}})_n}(\textbf{I}- \varvec{\Lambda })^{-1}_{jk}\right] \nonumber \\&\quad +\delta _{k \ne l} \left[ \frac{\partial }{\partial ({\theta _{\varvec{\Lambda }}})_n}(\textbf{I}- \varvec{\Lambda })^{-1}_{ik} (\textbf{I}- \varvec{\Lambda })^{-1}_{jl} + (\textbf{I}- \varvec{\Lambda })^{-1}_{ik} \frac{\partial }{\partial ({\theta _{\varvec{\Lambda }}})_n}(\textbf{I}- \varvec{\Lambda })^{-1}_{jl} \right] \end{aligned}$$\end{document}
(45) = ( I - Λ ) iu - 1 ( I - Λ ) vl - 1 ( I - Λ ) jk - 1 + ( I - Λ ) il - 1 ( I - Λ ) ju - 1 ( I - Λ ) vk - 1 + δ k l ( I - Λ ) iu - 1 ( I - Λ ) vk - 1 ( I - Λ ) jl - 1 + ( I - Λ ) ik - 1 ( I - Λ ) ju - 1 ( I - Λ ) vl - 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned}&= \left[ (\textbf{I}- \varvec{\Lambda })^{-1}_{iu} (\textbf{I}- \varvec{\Lambda })^{-1}_{vl} (\textbf{I}- \varvec{\Lambda })^{-1}_{jk} + (\textbf{I}- \varvec{\Lambda })^{-1}_{il} (\textbf{I}- \varvec{\Lambda })^{-1}_{ju} (\textbf{I}- \varvec{\Lambda })^{-1}_{vk} \right] \nonumber \\&\quad +\delta _{k \ne l} \left[ (\textbf{I}- \varvec{\Lambda })^{-1}_{iu} (\textbf{I}- \varvec{\Lambda })^{-1}_{vk} (\textbf{I}- \varvec{\Lambda })^{-1}_{jl} + (\textbf{I}- \varvec{\Lambda })^{-1}_{ik} (\textbf{I}- \varvec{\Lambda })^{-1}_{ju} (\textbf{I}- \varvec{\Lambda })^{-1}_{vl} \right] \end{aligned}$$\end{document}

with ( i , j ) = D r \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(i, j) = {\mathcal {D}}_r$$\end{document} , ( l , k ) = C s \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(l, k) = {\mathcal {C}}_s$$\end{document} , and ( u , v ) = E n \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(u, v) = {\mathcal {E}}_n$$\end{document} . Since G \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{G}$$\end{document} is of dimension dim ( σ ) × dim ( θ Ω ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\dim (\sigma ) \times \dim (\theta _{\varvec{\Omega }})$$\end{document} , with k = dim ( σ ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$k = \dim (\sigma )$$\end{document} we have

(46) vec ( G ) t = G t - k ( t - 1 ) / k ) , t / k \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} {{\,\textrm{vec}\,}}(\textbf{G})_t = \textbf{G}_{t - k\lfloor (t-1)/k) \rfloor , \; \lceil t / k \rceil } \end{aligned}$$\end{document}

and we obtain D G R dim ( σ ) dim ( θ Ω ) × dim ( θ Λ ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${{\,\mathrm{\textsf{D}}\,}}\textbf{G}\in \mathbb {R}^{\dim (\sigma )\dim (\theta _{\varvec{\Omega }}) \times \dim (\theta _{\varvec{\Lambda }})}$$\end{document} as

(47) D G = vec G θ Λ T = G t - k ( t - 1 ) / k ) , t / k ( θ Λ ) n t ( 1 , , dim ( σ ) dim ( θ Ω ) ) , n ( 1 , , dim ( θ Λ ) ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} {{\,\mathrm{\textsf{D}}\,}}\textbf{G}= \frac{\partial {{\,\textrm{vec}\,}}\textbf{G}}{\partial \theta _{\varvec{\Lambda }}^T} = \left( \left[ \frac{\partial \textbf{G}_{t - k\lfloor (t-1)/k) \rfloor , \; \lceil t / k \rceil }}{\partial ({\theta _{\varvec{\Lambda }}})_n}\right] _{t \in (1, \ldots , \dim (\sigma )\dim (\theta _{\varvec{\Omega }})), \; n \in (1, \ldots , \dim (\theta _{\varvec{\Lambda }}))}\right) \end{aligned}$$\end{document}

To facilitate software implementation, we give a way to compute D G \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${{\,\mathrm{\textsf{D}}\,}}\textbf{G}$$\end{document} in pseudocode in Algorithm 1. In practice, D G \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${{\,\mathrm{\textsf{D}}\,}}\textbf{G}$$\end{document} will typically contain many zero values. Therefore, it is advisable to analyze the sparsity pattern of D G \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${{\,\mathrm{\textsf{D}}\,}}\textbf{G}$$\end{document} before the optimization procedure begins and to only compute the nonzero values of D G \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${{\,\mathrm{\textsf{D}}\,}}\textbf{G}$$\end{document} at each iteration. Also note that the entries of D G \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${{\,\mathrm{\textsf{D}}\,}}\textbf{G}$$\end{document} are continuous w.r.t θ Λ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _{\varvec{\Lambda }}$$\end{document} , since they are sums of products of entries of the inverse ( I - Λ ) - 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(\textbf{I}- \varvec{\Lambda })^{-1}$$\end{document} , which is continuous.

3. Discussion

We have shown that separable nonlinear least squares is applicable to generalized least squares estimation of structural equation models formulated in the RAM notation. We have also shown a connection to path tracing rules in the form of trek rules. Note that when the same weight matrix is used, the point estimates obtained by SNLLS and LS are identical. Therefore, standard errors and test statistics are obtained using the same methods available for regular least squares estimation. In the following, we would like to discuss the two major benefits of using SNLLS for SEM: better convergence properties and a reduction in the computation time for parameter estimation.

3.1. Convergence

An important issue in SEM is convergence problems, especially in small samples (De Jonckere & Rosseel, Reference De Jonckere and Rosseel2022). If the optimizer fails to converge, no parameter estimates can be obtained. Using the SNLLS objective should lead to fewer convergence problems than LS, since only the directed parameters need to be estimated iteratively. Therefore, only the subset of directed parameters requires starting values. In many models, most of the directed parameters are factor loadings, and we can obtain very good starting values for them with the FABIN 3 estimator (Hägglund, Reference Hägglund1982). Also, Ruhe and Wedin (Reference Ruhe and Wedin1980) and Golub and Pereyra (Reference Golub and Pereyra2003) give additional proofs and reasons for why the reduced optimization problem of SNLLS should in principle be better behaved than the full LS problem. Additionally, for the class of models without unknown directed parameters, convergence problems should be eliminated altogether, as the estimator of the mean and (co)variance parameters can be computed analytically. Most prominently, this features many types of latent growth curve models.

To investigate the convergence properties of SNLLS in SEM, we ran a small simulation. We used the model in Fig. 2 to draw 1000 random data sets for varying sample sizes (N = 10 to N = 100) under the assumption of multivariate normality with zero expectation and the model-implied covariance induced by the parameters. The sample size and the factor loadings are deliberately chosen to be small to achieve a setting where non-convergence often occurs. We fitted the true model to each sample with generalized least squares (GLS; Bollen, Reference Bollen1989) and SNLLS estimation. All analyses were done in the programming language R (R Core Team, 2021). For GLS estimation, we used lavaan (Rosseel, Reference Rosseel2012). The plots were created with ggplot2 (Wickham, Reference Wickham2016), and the data were prepared with dplyr (Wickham et al., Reference Wickham, François, Henry and Müller2021). In Fig. 3 we report the number of converged models for each sample size. In Fig. 4, we report the median number of iterations needed until convergence for each sample size. Using SNLLS effectively halved the median number of iterations until convergence for most sample sizes and more than halved the number of non-converged models for most sample sizes. This indicates that SNLLS might be a useful alternative for applied researchers to consider if they encounter convergence problems.

Figure 2. The structural equation model used to compare convergence properties of SNLLS and GLS estimation, with two latent variables, ζ 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\zeta _{1}$$\end{document} and ζ 2 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\zeta _{2}$$\end{document} . Variances are omitted in this representation. The population values are the same as in De Jonckere and Rosseel (Reference De Jonckere and Rosseel2022): λ 1 = λ 4 = 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{1} = \lambda _{4} = 1$$\end{document} , λ 2 = λ 5 = 0.8 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{2} = \lambda _{5} = 0.8$$\end{document} , λ 3 = λ 6 = 0.6 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{3} = \lambda _{6} = 0.6$$\end{document} , β = 0.25 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\beta =0.25$$\end{document} , and all error variances are set to 1

Figure 3. Simulation results—number of converged replications out of 1000. GLS, generalized least squares; SNLLS, separable nonlinear least squares

Figure 4. Simulation results—median number of iterations by sample size. GLS, generalized least squares; SNLLS, separable nonlinear least squares

3.2. Computation Time

The benefits of SNLLS estimation, specifically the reduced dimensionality of the parameter space, better starting values and fewer iterations to convergence, could lead to reduced computation times. However, the computation of the SNLLS objective function and gradient is also more costly, so the cost per iteration can be higher. In sum, the question whether SNLLS estimation is faster in actual time spent in the optimization hinges upon several aspects, such as the actual implementation of the gradient, meta-parameters of the optimizer and model complexity.

Kreiberg et al. (Reference Kreiberg, Marcoulides and Olsson2021) stated that estimation by SNLLS will typically be multiple times faster than LS when the reduced parameter space is much smaller than the original one. They conducted a simulation study, where they fitted a number of CFA models and concluded that the estimation time is bigger for LS than for SNLLS as the number of estimated parameters increases. Even though their simulation is useful to illustrate the potential benefits of SNLLS, it seems unfit to us to make a case for a general reduction in computation time when using SNLLS in modern software. The gradient computation in the simulation was based on a finite difference approximation in both the LS and the SNLLS condition. In existing software (Rosseel, Reference Rosseel2012; von Oertzen et al., Reference von Oertzen, Brandmaier and Tsang2015), analytic gradients are implemented for LS estimation, so the authors compare against a straw man that would not be used in practice if computational efficiency is important. In addition, centered finite differences takes 2q calls to the objective function per computation of the gradient, where q is the number of parameters. Since SNLLS results in a smaller parameter space, their method of differentiation favors the SNLLS procedure.

It remains to implement a competitive version of SNLLS optimization for SEM using the analytic gradients derived in this paper to be able to do a realistic simulation to investigate whether SNLLS outperforms the LS estimator in practice. However, there is a large body of research concerning the efficient implementation of SNLLS (see, for example, Kaufman, Reference Kaufman1975; O’Leary and Rust, Reference O’Leary and Rust2013); writing competitive software for SNLLS in SEMs would be a research topic on its own. Therefore, we only give simulation results concerning the improvement of convergence rates and the number of iterations in this paper. As noted previously, for the class of models without unknown directed parameters, the estimator of the mean and (co)variance parameters can be computed in a single step. As a result, those models should especially benefit from lower computation times.

3.3. An Outlook on Maximum Likelihood Estimation

If the assumption of multivariate normality is tenable, another method of obtaining parameter estimates is maximum likelihood estimation. Here, we briefly discuss to what extent our results may have an impact on maximum likelihood optimization of SEMs. In least squares estimation with a fixed weight matrix, we saw that the undirected parameters θ Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _{\varvec{\Omega }}$$\end{document} and the mean parameters θ γ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _\gamma $$\end{document} enter the objective linearly. For maximum likelihood estimation, we believe it is not possible to factor out the undirected parameters (for most models used in practice). This is because the likelihood of the normal distribution

(48) ϕ ( x ) = ( 2 π ) m obs det Σ - 1 2 exp - 1 2 ( x - μ ) T Σ - 1 ( x - μ ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \phi (x) = \left( (2\pi )^{m_\textrm{obs}} \det \varvec{\Sigma }\right) ^{-\frac{1}{2}} \exp \left( -\frac{1}{2} (x - \mu )^T \varvec{\Sigma }^{-1} (x - \mu ) \right) \end{aligned}$$\end{document}

depends on the inverse of the model-implied covariance matrix. For the simplistic example model depicted in Fig. 5, we derive the model-implied covariance matrix as

(49) Σ = ω l + ω 1 ω l ω l ω l + ω 2 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \varvec{\Sigma }= \begin{pmatrix} \omega _l + \omega _1 &{} \omega _l\\ \omega _l &{} \omega _l + \omega _2 \end{pmatrix} \end{aligned}$$\end{document}

and the inverse can be computed as

(50) Σ - 1 = ( det Σ ) - 1 adj Σ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \varvec{\Sigma }^{-1} = (\det \varvec{\Sigma })^{-1} {{\,\textrm{adj}\,}}\varvec{\Sigma }\end{aligned}$$\end{document}

where adj \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${{\,\textrm{adj}\,}}$$\end{document} refers to the adjugate matrix, so in our example,

(51) det Σ = ( ω l + ω 1 ) ( ω l + ω 2 ) - ω l 2 = ω 1 ω l + ω 2 ω l + ω 1 ω 2 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \det \varvec{\Sigma }= (\omega _l + \omega _1)(\omega _l + \omega _2) - \omega _l^2 = \omega _1\omega _l + \omega _2\omega _l + \omega _1\omega _2 \end{aligned}$$\end{document}

and

(52) Σ - 1 = ( ω 1 ω l + ω 2 ω l + ω 1 ω 2 ) - 1 ω l + ω 2 - ω l - ω l ω l + ω 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \varvec{\Sigma }^{-1} = (\omega _1\omega _l + \omega _2\omega _l + \omega _1\omega _2)^{-1} \begin{pmatrix} \omega _l + \omega _2 &{}\quad -\omega _l\\ -\omega _l &{}\quad \omega _l + \omega _1 \end{pmatrix} \end{aligned}$$\end{document}

Figure 5. Graph of a simplistic example model with one latent variable, measured by two indicators. The model contains no unknown directed effects and only two observed variables to allow for an easily traceable computation of the inverse of the model-implied covariance matrix. All variances are treated as unknown parameters

We see that θ Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _{\varvec{\Omega }}$$\end{document} enters the determinant and therefore the inverse of Σ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Sigma }$$\end{document} in a nonlinear way. In general, the Leibniz Formula for the determinant gives

(53) det Σ = π S m obs sgn ( π ) i = 1 m obs Σ i , π ( i ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \det \varvec{\Sigma }= \sum _{\pi \in {\mathcal {S}}_{m_\textrm{obs}}} {{\,\textrm{sgn}\,}}(\pi ) \prod _{i = 1}^{m_\textrm{obs}} \varvec{\Sigma }_{i,\pi (i)} \end{aligned}$$\end{document}

where S m obs \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\mathcal {S}}_{m_\textrm{obs}}$$\end{document} denotes the symmetric group. Since this formula multiplies entries of Σ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Sigma }$$\end{document} , and we saw in Eq. 30 that the entries of Σ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Sigma }$$\end{document} depend on the undirected parameters, it is very likely that those form a product and enter the objective in a nonlinear way. However, for the mean parameters, the picture may be different and we leave this for future work. If the model is saturated (e.g., has zero degrees of freedom), the least squares estimates are the same as the maximum likelihood estimates, since S = Σ ( θ ^ ML ) = Σ ( θ ^ LS ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{S} = \varvec{\Sigma }(\hat{\theta }_\textrm{ML}) = \varvec{\Sigma }(\hat{\theta }_\textrm{LS})$$\end{document} . Also, Lee and Jennrich (Reference Lee and Jennrich1979) showed that maximum likelihood estimation can be obtained as a form of iteratively reweighted least squares if V \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{V}$$\end{document} is a function of the parameters:

(54) V = 1 2 D T Σ - 1 Σ - 1 D \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \textbf{V}= \frac{1}{2}\textbf{D}^T\left( \varvec{\Sigma }^{-1} \otimes \varvec{\Sigma }^{-1}\right) \textbf{D}\end{aligned}$$\end{document}

where D \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{D}$$\end{document} denotes the duplication matrix from Magnus and Neudecker (Reference Magnus and Neudecker2019b). Another way of obtaining ML estimates with SNLLS would therefore be to minimize the SNLLS objective and use the obtained Σ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Sigma }$$\end{document} to update the weight matrix V \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{V}$$\end{document} as given in Eq. 54. SNLLS could then be rerun with the updated weight matrix, and the weight matrix be updated again, until Σ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Sigma }$$\end{document} converges to Σ ( θ ^ ML ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Sigma }(\hat{\theta }_\textrm{ML})$$\end{document} . However, we would like to note that this procedure is probably computationally quite inefficient.

3.4. Conclusion

We generalized separable nonlinear least squares estimation to all linear structural equation models that can be specified in the RAM notation, particularly those including a mean structure. We explained this result with the help of trek rules and the non-transitivity of the covariances of the error terms, providing deeper insight into the algebraic relations between the parameters of SEMs. We further derived analytic gradients and explained why they are of central importance to obtain a competitive implementation. Our simulation indicates that SNLLS leads to improvements in convergence rate and number of iterations. It remains for future research to investigate the computational costs empirically. We also showed why it is unlikely that undirected parameters enter the maximum likelihood objective linearly. Thus, another line of research could be concerned with the applicability of SNLLS to the mean parameters in maximum likelihood estimation and the relationship of SNLLS to other decomposition methods for maximum likelihood estimation (Pritikin et al., Reference Pritikin, Hunter, von Oertzen, Brick and Boker2017, Reference Pritikin, Brick and Neale2018). Further research might also examine whether SNLLS is applicable to multilevel models. SNLLS promises better convergence rates for least squares parameter estimation in SEM and, with an efficient implementation, also reduced computation times. This result is important in its own right but may as well serve as a first step for generating starting values for subsequent ML estimation.

Funding Information

Open Access funding enabled and organized by Projekt DEAL.

Declarations

Conflict of interest

We have no conflicts of interest to disclose.

Appendix A: Equality Constraints

Kreiberg et al. (Reference Kreiberg, Marcoulides and Olsson2021) showed how to incorporate equality constraints in CFA models. Because their proof follows a different approach, we show how to incorporate equality constraints in our expressions. Since the SNLLS objective only depends on θ Λ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _{\varvec{\Lambda }}$$\end{document} , constraints in θ Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _{\varvec{\Omega }}$$\end{document} and θ γ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _\gamma $$\end{document} can be difficult to implement. However, simple equality constraints (e.g., θ j = θ i \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _j = \theta _i$$\end{document} ) are feasible under SNLLS. Since σ = G θ Ω , γ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\sigma = \textbf{G}\, \theta _{\varvec{\Omega }, \gamma }$$\end{document} , we see that if two (or more) parameters in θ Ω , γ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _{\varvec{\Omega }, \gamma }$$\end{document} are equal, we can delete all but one occurrence from the parameter vector and add the relevant columns in G \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{G}$$\end{document} together, e.g.,

(A1) a b c d e d = a d + b e + c d = ( a + c ) d + b e = a + c b d e \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \begin{pmatrix} a&b&c \end{pmatrix} \begin{pmatrix} d \\ e \\ d \end{pmatrix} = ad + be + cd = (a+c)d + be = \begin{pmatrix} a + c&b \end{pmatrix} \begin{pmatrix} d \\ e \end{pmatrix} \end{aligned}$$\end{document}

Or, put differently, if we allow the index tuples C \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\mathcal {C}}$$\end{document} and A \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\mathcal {A}}$$\end{document} to have sets of indices as entries, i.e., C i = { ( k , l ) N × N | θ Ω i = ω kl k l } \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\mathcal {C}}_i = \{(k, l) \in \mathbb {N}\times \mathbb {N}\; | \; \theta _{\varvec{\Omega }_i} = \omega _{kl} \wedge k \ge l\}$$\end{document} , we obtain

(A2) G σ = ( l , k ) c ( I - Λ ) il - 1 ( I - Λ ) jk - 1 + δ k l ( I - Λ ) ik - 1 ( I - Λ ) jl - 1 ( i , j ) D , c C \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \textbf{G}_\sigma = \left( \left[ \sum _{(l,k) \in c} (\textbf{I}- \varvec{\Lambda })^{-1}_{il} (\textbf{I}- \varvec{\Lambda })^{-1}_{jk} + \delta _{k \ne l} \, (\textbf{I}- \varvec{\Lambda })^{-1}_{ik} (\textbf{I}- \varvec{\Lambda })^{-1}_{jl}\right] _{(i, j) \in {\mathcal {D}}, \; c \in {\mathcal {C}}} \right) \end{aligned}$$\end{document}

An expression for G μ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{G}_\mu $$\end{document} can be obtained in a similar way.

Appendix B: Constants in Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Omega }$$\end{document}

To handle constants in Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Omega }$$\end{document} different from zero, we introduce c as the vector of constant nonzero entries in Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Omega }$$\end{document} and E \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\mathcal {E}}$$\end{document} as the lower triangular indices of c in Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Omega }$$\end{document} . Further, define

(B1) G c : = ( I - Λ ) il - 1 ( I - Λ ) jk - 1 + δ k l ( I - Λ ) ik - 1 ( I - Λ ) jl - 1 ( i , j ) D , ( l , k ) E \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \textbf{G}_c :=\left( \left[ (\textbf{I}- \varvec{\Lambda })^{-1}_{il} (\textbf{I}- \varvec{\Lambda })^{-1}_{jk} + \delta _{k \ne l} \, (\textbf{I}- \varvec{\Lambda })^{-1}_{ik} (\textbf{I}- \varvec{\Lambda })^{-1}_{jl}\right] _{(i, j) \in {\mathcal {D}}, \; (l,k) \in {\mathcal {E}}} \; \right) \end{aligned}$$\end{document}

with D \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\mathcal {D}}$$\end{document} defined as in Eq. 36, i.e., the indices of the original position of σ k \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\sigma _k$$\end{document} in Σ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\varvec{\Sigma }$$\end{document} . This allows us to modify Eq. 38 to

(B2) σ = G ( θ Λ ) θ Ω + G c c \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \sigma = \textbf{G}(\theta _{\varvec{\Lambda }}) \; \theta _{\varvec{\Omega }} \; + \; \textbf{G}_c \; c \end{aligned}$$\end{document}

and reformulate the least squares objective as

(B3) F LS = ( s - σ ) T V ( s - σ ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} F_{\text {LS}}&= (s - \sigma )^T \textbf{V}(s - \sigma ) \end{aligned}$$\end{document}
(B4) = s - G ( θ Λ ) θ Ω + G c c T V s - G ( θ Λ ) θ Ω + G c c \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned}&= \left( s - \left( \textbf{G}(\theta _{\varvec{\Lambda }})\theta _{\varvec{\Omega }} + \textbf{G}_c c \right) \right) ^T \textbf{V}\left( s - \left( \textbf{G}(\theta _{\varvec{\Lambda }})\theta _{\varvec{\Omega }} + \textbf{G}_c c\right) \right) \end{aligned}$$\end{document}
(B5) = s - G c c - G ( θ Λ ) θ Ω T V s - G c c - G ( θ Λ ) θ Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned}&= \left( \left( s - \textbf{G}_c c \right) - \textbf{G}(\theta _{\varvec{\Lambda }})\theta _{\varvec{\Omega }} \right) ^T \textbf{V}\left( \left( s - \textbf{G}_c c \right) - \textbf{G}(\theta _{\varvec{\Lambda }})\theta _{\varvec{\Omega }}\right) \end{aligned}$$\end{document}
(B6) = s - G ( θ Λ ) θ Ω V 2 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned}&= \left\Vert s' - \textbf{G}(\theta _{\varvec{\Lambda }})\theta _{\varvec{\Omega }}\right\Vert ^2_\textbf{V}\end{aligned}$$\end{document}

with s = s - G c c \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$s' = \left( s - \textbf{G}_c c \right) $$\end{document} . For a fixed value of θ Λ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _{\varvec{\Lambda }}$$\end{document} , we can now again solve for θ Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _{\varvec{\Omega }}$$\end{document} .

Appendix C: Gradient of the SNLLS Objective

Let a T : = s T V G G T V G - 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$a^T:= s^T\textbf{V}\textbf{G}\left( \textbf{G}^T \textbf{V}\textbf{G}\right) ^{-1}$$\end{document} . We derive the differential as

(C1) d F SNLLS = d s T V s - s T V G G T V G - 1 G T V s = - d s T V G G T V G - 1 G T V s = - s T V d G G T V G - 1 G T V s - s T V G d G T V G - 1 G T V s - s T V G G T V G - 1 d G T V s = - 2 s T V d G G T V G - 1 G T V s + s T V G G T V G - 1 d G T V G G T V G - 1 G T V s = - 2 s T V d G a + s T V G G T V G - 1 d G T V G + G T V d G G T V G - 1 G T V s = - 2 s T V d G a + a T d G T V G + G T V d G a = - 2 s T V d G a + 2 a T G T V d G a = - 2 a T s T V d vec G + 2 a T a T G T V d vec G = - 2 vec V s a T T d vec G + 2 vec V G a a T T d vec G = 2 vec V G a a T - V s a T T d vec G = 2 vec V G a - V s a T T d vec G = 2 vec V G a - V s a T T D G = D F SNLLS d θ Λ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} {{\,\mathrm{\textsf{d}}\,}}F_\text {SNLLS}&= {{\,\mathrm{\textsf{d}}\,}}\left( s^T\textbf{V}s - s^T\textbf{V}\textbf{G}\left( \textbf{G}^T\textbf{V}\textbf{G}\right) ^{-1}\textbf{G}^T\textbf{V}s \right) \nonumber \\&= - {{\,\mathrm{\textsf{d}}\,}}\left( s^T\textbf{V}\textbf{G}\left( \textbf{G}^T\textbf{V}\textbf{G}\right) ^{-1}\textbf{G}^T\textbf{V}s \right) \nonumber \\&= - s^T\textbf{V}{{\,\mathrm{\textsf{d}}\,}}\textbf{G}\left( \textbf{G}^T\textbf{V}\textbf{G}\right) ^{-1}\textbf{G}^T\textbf{V}s \nonumber \\&\quad - s^T\textbf{V}\textbf{G}{{\,\mathrm{\textsf{d}}\,}}\left( \textbf{G}^T\textbf{V}\textbf{G}\right) ^{-1}\textbf{G}^T\textbf{V}s \nonumber \\&\quad - s^T\textbf{V}\textbf{G}\left( \textbf{G}^T\textbf{V}\textbf{G}\right) ^{-1} {{\,\mathrm{\textsf{d}}\,}}\textbf{G}^T\textbf{V}s \nonumber \\&= -2 \; s^T\textbf{V}{{\,\mathrm{\textsf{d}}\,}}\textbf{G}\left( \textbf{G}^T\textbf{V}\textbf{G}\right) ^{-1}\textbf{G}^T\textbf{V}s \nonumber \\&\quad + s^T\textbf{V}\textbf{G}\left( \textbf{G}^T\textbf{V}\textbf{G}\right) ^{-1} {{\,\mathrm{\textsf{d}}\,}}\left( \textbf{G}^T\textbf{V}\textbf{G}\right) \left( \textbf{G}^T\textbf{V}\textbf{G}\right) ^{-1}\textbf{G}^T\textbf{V}s \nonumber \\&= -2 \; s^T\textbf{V}{{\,\mathrm{\textsf{d}}\,}}\textbf{G}a \nonumber \\&\quad + s^T\textbf{V}\textbf{G}\left( \textbf{G}^T\textbf{V}\textbf{G}\right) ^{-1} \left[ \left( {{\,\mathrm{\textsf{d}}\,}}\textbf{G}^T\textbf{V}\textbf{G}\right) + \left( \textbf{G}^T\textbf{V}{{\,\mathrm{\textsf{d}}\,}}\textbf{G}\right) \right] \left( \textbf{G}^T\textbf{V}\textbf{G}\right) ^{-1}\textbf{G}^T\textbf{V}s \nonumber \\&= -2 \; s^T\textbf{V}{{\,\mathrm{\textsf{d}}\,}}\textbf{G}a\nonumber \\&\quad + a^T \left[ \left( {{\,\mathrm{\textsf{d}}\,}}\textbf{G}^T\textbf{V}\textbf{G}\right) + \left( \textbf{G}^T\textbf{V}{{\,\mathrm{\textsf{d}}\,}}\textbf{G}\right) \right] a \nonumber \\&= -2 \; s^T\textbf{V}{{\,\mathrm{\textsf{d}}\,}}\textbf{G}a\nonumber \\&\quad +2 \; a^T \textbf{G}^T\textbf{V}{{\,\mathrm{\textsf{d}}\,}}\textbf{G}a \nonumber \\&= -2 \; \left( a^T \otimes s^T\textbf{V}\right) {{\,\mathrm{\textsf{d}}\,}}{{\,\textrm{vec}\,}}\textbf{G}\nonumber \\&\quad +2 \; \left( a^T \otimes a^T \textbf{G}^T \textbf{V}\right) {{\,\mathrm{\textsf{d}}\,}}{{\,\textrm{vec}\,}}\textbf{G}\nonumber \\&= -2 \; {{\,\textrm{vec}\,}}\left( \textbf{V}s a^T \right) ^T {{\,\mathrm{\textsf{d}}\,}}{{\,\textrm{vec}\,}}\textbf{G}\nonumber \\&\quad +2 \; {{\,\textrm{vec}\,}}\left( \textbf{V}\textbf{G}a a^T \right) ^T {{\,\mathrm{\textsf{d}}\,}}{{\,\textrm{vec}\,}}\textbf{G}\nonumber \\&= 2 \; {{\,\textrm{vec}\,}}\left( \textbf{V}\textbf{G}a a^T - \textbf{V}s a^T \right) ^T {{\,\mathrm{\textsf{d}}\,}}{{\,\textrm{vec}\,}}\textbf{G}\nonumber \\&= 2 \; {{\,\textrm{vec}\,}}\left( \left( \textbf{V}\textbf{G}a - \textbf{V}s\right) a^T\right) ^T {{\,\mathrm{\textsf{d}}\,}}{{\,\textrm{vec}\,}}\textbf{G}\nonumber \\&= \underbrace{2 \; {{\,\textrm{vec}\,}}\left( \left( \textbf{V}\textbf{G}a - \textbf{V}s\right) a^T\right) ^T {{\,\mathrm{\textsf{D}}\,}}\textbf{G}}_{= {{\,\mathrm{\textsf{D}}\,}}F_\text {SNLLS}} {{\,\mathrm{\textsf{d}}\,}}\theta _{\varvec{\Lambda }} \end{aligned}$$\end{document}

Appendix D: Alternative Proof

This is the analogous formulation to the one given in Kreiberg et al. (Reference Kreiberg, Marcoulides and Olsson2021) for CFA models. We see that the resulting expressions contain very large Kronecker products; for reasons of computational efficiency, we therefore favor the expressions given in the main text. Let D + \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{D}^+$$\end{document} denote the Moore–Penrose inverse of the duplication matrix D m obs \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{D}_{m_\textrm{obs}}$$\end{document} from Magnus and Neudecker (Reference Magnus and Neudecker2019b) such that

(D1) σ = D + vec ( Σ ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \sigma = \textbf{D}^+{{\,\textrm{vec}\,}}(\varvec{\Sigma }) \end{aligned}$$\end{document}

and L \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{L}$$\end{document} be a matrix such that

(D2) vec ( Ω ) = L θ Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} {{\,\textrm{vec}\,}}(\varvec{\Omega }) = \textbf{L}\; \theta _{\varvec{\Omega }} \end{aligned}$$\end{document}

We can obtain L R m 2 × dim ( θ Ω ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{L}\in \mathbb {R}^{m^2 \times \dim (\theta _{\varvec{\Omega }})}$$\end{document} as

(D3) L ij = 1 , if i = ( k - 1 ) m + l i = ( l - 1 ) m + k with ( k , l ) = C j 0 , otherwise \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \textbf{L}_{ij} = {\left\{ \begin{array}{ll} 1, &{} \text {if } i = (k-1)m + l \vee i = (l-1)m + k \text { with } (k, l) = {\mathcal {C}}_j\\ 0, &{} \text {otherwise} \end{array}\right. } \end{aligned}$$\end{document}

and derive G ( θ Λ ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{G}(\theta _{\varvec{\Lambda }})$$\end{document} as

(D4) σ ( θ ) = D + vec ( Σ ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \sigma (\theta )&= \textbf{D}^+ {{\,\textrm{vec}\,}}(\varvec{\Sigma }) \end{aligned}$$\end{document}
(D5) = D + vec ( F ( I - Λ ) - 1 Ω ( I - Λ ) - T F T ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned}&= \textbf{D}^+ {{\,\textrm{vec}\,}}(\textbf{F}(\textbf{I}-\varvec{\Lambda })^{-1}\varvec{\Omega }(\textbf{I}-\varvec{\Lambda })^{-T}\textbf{F}^T) \end{aligned}$$\end{document}
(D6) = D + F I - Λ - 1 F I - Λ - 1 vec ( Ω ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned}&= \textbf{D}^+ \left( \textbf{F}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\otimes \textbf{F}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\right) {{\,\textrm{vec}\,}}(\varvec{\Omega }) \end{aligned}$$\end{document}
(D7) = D + F I - Λ - 1 F I - Λ - 1 L = G ( θ Λ ) θ Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned}&= \underbrace{\textbf{D}^+ \left( \textbf{F}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\otimes \textbf{F}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\right) \textbf{L}}_{= G(\theta _{\varvec{\Lambda }})} \; \theta _{\varvec{\Omega }} \end{aligned}$$\end{document}

We further define P : = L T D + F F \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{P}:= \left( \textbf{L}^T \otimes \textbf{D}^+ \left( \textbf{F}\otimes \textbf{F}\right) \right) $$\end{document} and Q : = I m K m I m \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textbf{Q}:= \left( \textbf{I}_m \otimes \textbf{K}_m \otimes \textbf{I}_m \right) $$\end{document} , where K m \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\textbf {K}}_m$$\end{document} is the commutation matrix from Magnus and Neudecker (Reference Magnus and Neudecker2019b), and derive D G \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${{\,\mathrm{\textsf{D}}\,}}\textbf{G}$$\end{document} as

(D8) d vec G = d vec D + F I - Λ - 1 F I - Λ - 1 L = d vec D + F F I - Λ - 1 I - Λ - 1 L = L T D + F F d vec I - Λ - 1 I - Λ - 1 = P vec d I - Λ - 1 I - Λ - 1 + I - Λ - 1 d I - Λ - 1 = P [ vec I - Λ - 1 d Λ I - Λ - 1 I - Λ - 1 + vec I - Λ - 1 I - Λ - 1 d Λ I - Λ - 1 ] = P Q [ vec I - Λ - 1 d Λ I - Λ - 1 vec I - Λ - 1 + vec I - Λ - 1 vec I - Λ - 1 d Λ I - Λ - 1 ] = P Q [ I m 2 vec I - Λ - 1 vec I - Λ - 1 d Λ I - Λ - 1 + vec I - Λ - 1 I m 2 vec I - Λ - 1 d Λ I - Λ - 1 ] = P Q I m 2 vec I - Λ - 1 + vec I - Λ - 1 I m 2 I - Λ - T I - Λ - 1 D Λ = D G d θ Λ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} {{\,\mathrm{\textsf{d}}\,}}{{\,\textrm{vec}\,}}\textbf{G}&= {{\,\mathrm{\textsf{d}}\,}}{{\,\textrm{vec}\,}}\left[ \textbf{D}^+ \left( \textbf{F}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\otimes \textbf{F}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\right) \textbf{L}\right] \nonumber \\&= {{\,\mathrm{\textsf{d}}\,}}{{\,\textrm{vec}\,}}\left[ \textbf{D}^+ \left( \textbf{F}\otimes \textbf{F}\right) \left( \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\otimes \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\right) \textbf{L}\right] \nonumber \\&= \left( \textbf{L}^T \otimes \textbf{D}^+ \left( \textbf{F}\otimes \textbf{F}\right) \right) {{\,\mathrm{\textsf{d}}\,}}{{\,\textrm{vec}\,}}\left( \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\otimes \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\right) \nonumber \\&= \textbf{P}{{\,\textrm{vec}\,}}\left( {{\,\mathrm{\textsf{d}}\,}}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\otimes \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} + \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\otimes {{\,\mathrm{\textsf{d}}\,}}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\right) \nonumber \\&= \textbf{P}[ \nonumber \\&\quad {{\,\textrm{vec}\,}}\left( \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} {{\,\mathrm{\textsf{d}}\,}}\varvec{\Lambda }\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\otimes \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\right) \nonumber \\&\quad + {{\,\textrm{vec}\,}}\left( \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\otimes \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} {{\,\mathrm{\textsf{d}}\,}}\varvec{\Lambda }\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\right) \nonumber \\&\quad ] \nonumber \\&= \textbf{P}\textbf{Q}[ \nonumber \\&\quad {{\,\textrm{vec}\,}}\left( \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} {{\,\mathrm{\textsf{d}}\,}}\varvec{\Lambda }\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} \right) \otimes {{\,\textrm{vec}\,}}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} \nonumber \\&\quad + {{\,\textrm{vec}\,}}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} \otimes {{\,\textrm{vec}\,}}\left( \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} {{\,\mathrm{\textsf{d}}\,}}\varvec{\Lambda }\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\right) \nonumber \\&\quad ] \nonumber \\&= \textbf{P}\textbf{Q}[ \nonumber \\&\quad \left( \textbf{I}_{m^2} \otimes {{\,\textrm{vec}\,}}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} \right) {{\,\textrm{vec}\,}}\left( \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} {{\,\mathrm{\textsf{d}}\,}}\varvec{\Lambda }\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} \right) \nonumber \\&\quad + \left( {{\,\textrm{vec}\,}}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} \otimes \textbf{I}_{m^2} \right) {{\,\textrm{vec}\,}}\left( \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} {{\,\mathrm{\textsf{d}}\,}}\varvec{\Lambda }\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} \right) \nonumber \\&\quad ] \nonumber \\&= \underbrace{\textbf{P}\textbf{Q}\left[ \left( \textbf{I}_{m^2} \otimes {{\,\textrm{vec}\,}}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} \right) + \left( {{\,\textrm{vec}\,}}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} \otimes \textbf{I}_{m^2} \right) \right] \left( \left( \textbf{I}-\varvec{\Lambda }\right) ^{-T} \otimes \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} \right) \textsf{D} \varvec{\Lambda }}_{= {{\,\mathrm{\textsf{D}}\,}}\textbf{G}} {{\,\mathrm{\textsf{d}}\,}}\theta _\Lambda \end{aligned}$$\end{document}

Footnotes

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Boker, S. M., McArdle, J. J., Neale, M., (2002). An algorithm for the hierarchical organization of path diagrams and calculation of components of expected covariance Structural Equation Modeling: A Multidisciplinary Journal 9(2) 174194 10.1207/S15328007SEM0902_2CrossRefGoogle Scholar
Bollen, K. A. (1989). Structural equations with latent variables. John Wiley & Sons, Ltd. https://doi.org/10.1002/9781118619179CrossRefGoogle Scholar
Browne, M. W. (1982). Covariance structures. In Hawkins, D. M. (Ed.), Topics in applied multivariate analysis (pp. 72–141). Cambridge University Press. https://doi.org/10.1017/CBO9780511897375.003CrossRefGoogle Scholar
Browne, M. W., (1984). Asymptotically distribution-free methods for the analysis of covariance structures British Journal of Mathematical and Statistical Psychology 37(1) 6283 10.1111/j.2044-8317.1984.tb00789.x 6733054CrossRefGoogle ScholarPubMed
De Jonckere, J., Rosseel, Y., (2022). Using bounded estimation to avoid nonconvergence in small sample structural equation modeling Structural Equation Modeling: A Multidisciplinary Journal 29(3) 412427 10.1080/10705511.2021.1982716CrossRefGoogle Scholar
Drton, M., (2018). Algebraic problems in structural equation modeling Advanced Studies in Pure Mathematics 77 3586 10.2969/aspm/07710035CrossRefGoogle Scholar
Ernst, M. S. (2022). Separable nonlinear least squares estimation of structural equation models. Master’s thesis. Humboldt-Universität zu Berlin.Google Scholar
Golub, G. H., Pereyra, V., (1973). The differentiation of pseudo-inverses and nonlinear least squares problems whose variables separate SIAM Journal on Numerical Analysis 10(2) 413432 10.1137/0710036CrossRefGoogle Scholar
Golub, G. H., Pereyra, V., (2003). Separable nonlinear least squares: The variable projection method and its applications Inverse Problems 19(2) R1R26 10.1088/0266-5611/19/2/201CrossRefGoogle Scholar
Hägglund, G., (1982). Factor analysis by instrumental variables methods Psychometrika 47(2) 209222 10.1007/BF02296276CrossRefGoogle Scholar
Kaufman, L., (1975). A variable projection method for solving separable nonlinear least squares problems BIT Numerical Mathematics 15(1) 4957 10.1007/BF01932995CrossRefGoogle Scholar
Kepner, J., & Gilbert, J. (Eds.). (2011). Graph algorithms in the language of linear algebra. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9780898719918CrossRefGoogle Scholar
Kreiberg, D., Marcoulides, K., Olsson, U. H., (2021). A faster procedure for estimating cfa models applying minimum distance estimators with a fixed weight matrix Structural Equation Modeling: A Multidisciplinary Journal 28(5) 725739 10.1080/10705511.2020.1835484CrossRefGoogle Scholar
Kreiberg, D., Söderström, T., Yang-Wallentin, F., (2016). Errors-in-variables system identification using structural equation modeling Automatica 66 218230 10.1016/j.automatica.2015.12.007CrossRefGoogle Scholar
Lee, S. Y., Jennrich, R. I., (1979). A study of algorithms for covariance structure analysis with specific comparisons using factor analysis Psychometrika 44(1) 99113 10.1007/BF02293789CrossRefGoogle Scholar
Magnus, J. R., & Neudecker, H. (2019a). Differentials and differentiability. In Matrix differential calculus with applications in statistics and econometrics (pp. 87–110). John Wiley & Sons, Ltd. https://doi.org/10.1002/9781119541219.ch5CrossRefGoogle Scholar
Magnus, J. R., & Neudecker, H. (2019b). Miscellaneous matrix results. In Matrix differential calculus with applications in statistics and econometrics (pp. 47–70). John Wiley & Sons, Ltd. https://doi.org/10.1002/9781119541219.ch3CrossRefGoogle Scholar
McArdle, J. J., McDonald, R. P., (1984). Some algebraic properties of the reticular action model for moment structures British Journal of Mathematical and Statistical Psychology 37(2) 234251 10.1111/j.2044-8317.1984.tb00802.x 6509005CrossRefGoogle ScholarPubMed
Mulaik, S. A. (2009). Linear causal modeling with structural equations. Chapman and Hall/CRC. https://doi.org/10.1201/9781439800393CrossRefGoogle Scholar
O’Leary, D. P., Rust, B. W., (2013). Variable projection for nonlinear least squares problems Computational Optimization and Applications 54(3) 579593 10.1007/s10589-012-9492-9CrossRefGoogle Scholar
Pritikin, J. N., Brick, T. R., Neale, M. C., (2018). Multivariate normal maximum likelihood with both ordinal and continuous variables, and data missing at random Behavior Research Methods 50(2) 490500 10.3758/s13428-017-1011-6 29374390 5882529CrossRefGoogle ScholarPubMed
Pritikin, J. N., Hunter, M. D., von Oertzen, T., Brick, T. R., Boker, S. M., (2017). Many-level multilevel structural equation modeling: An efficient evaluation strategy Structural Equation Modeling: A Multidisciplinary Journal 24(5) 684698 10.1080/10705511.2017.1293542 29606847CrossRefGoogle ScholarPubMed
R Core Team. (2021). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. https://www.R-project.org/Google Scholar
Rosseel, Y., (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software 48(2) 136 10.18637/jss.v048.i02CrossRefGoogle Scholar
Ruhe, A., Wedin, (1980). Algorithms for separable nonlinear least squares problems SIAM Review 22(3) 318337 10.1137/1022057CrossRefGoogle Scholar
von Oertzen, T., Brandmaier, A. M., Tsang, S., Structural equation modeling with Ω \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\Omega $$\end{document} nyx Structural Equation Modeling: A Multidisciplinary Journal 22(1) 148161 10.1080/10705511.2014.935842CrossRefGoogle Scholar
Wickham, H. (2016). Ggplot2: Elegant graphics for data analysis. Springer. https://www.ggplot2.tidyverse.orgGoogle Scholar
Wickham, H., François, R., Henry, L., & Müller, K. (2021). Dplyr: A grammar of data manipulation. https://www.CRAN.R-project.org/package=dplyrGoogle Scholar
Figure 0

Figure 1. Graph of a bi-factor model with one general factor and two specific factors. Circles represent latent variables, and rectangles represent observed variables. Variances are omitted in this representation.

Figure 1

Figure 2. The structural equation model used to compare convergence properties of SNLLS and GLS estimation, with two latent variables, ζ1\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\zeta _{1}$$\end{document} and ζ2\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\zeta _{2}$$\end{document}. Variances are omitted in this representation. The population values are the same as in De Jonckere and Rosseel (2022): λ1=λ4=1\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{1} = \lambda _{4} = 1$$\end{document}, λ2=λ5=0.8\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{2} = \lambda _{5} = 0.8$$\end{document}, λ3=λ6=0.6\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\lambda _{3} = \lambda _{6} = 0.6$$\end{document}, β=0.25\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\beta =0.25$$\end{document}, and all error variances are set to 1

Figure 2

Figure 3. Simulation results—number of converged replications out of 1000. GLS, generalized least squares; SNLLS, separable nonlinear least squares

Figure 3

Figure 4. Simulation results—median number of iterations by sample size. GLS, generalized least squares; SNLLS, separable nonlinear least squares

Figure 4

Figure 5. Graph of a simplistic example model with one latent variable, measured by two indicators. The model contains no unknown directed effects and only two observed variables to allow for an easily traceable computation of the inverse of the model-implied covariance matrix. All variances are treated as unknown parameters

Supplementary material: File

Ernst et al. supplementary material

Ernst et al. supplementary material 1
Download Ernst et al. supplementary material(File)
File 3.7 KB
Supplementary material: File

Ernst et al. supplementary material

Ernst et al. supplementary material 2
Download Ernst et al. supplementary material(File)
File 4.6 KB