A Note on Ising Network Analysis with Missing Data

Siliang Zhang; Yunxiao Chen

doi:10.1007/s11336-024-09985-2

A Note on Ising Network Analysis with Missing Data

Published online by Cambridge University Press: 01 January 2025

Siliang Zhang

and

Yunxiao Chen

Show author details

Siliang Zhang: Affiliation:
East China Normal University
Yunxiao Chen*: Affiliation:
London School of Economics and Political Science
*: Correspondence should be made to Yunxiao Chen, Department of Statistics, London School of Economics and Political Science, Room 5.16 Columbia House, Houghton Street, London WC2A 2AE, UK. Email: [email protected]

Article contents

Abstract
Proposed Method
Numerical Experiments
Concluding remarks
Footnotes
References

Rights & Permissions

Abstract

The Ising model has become a popular psychometric model for analyzing item response data. The statistical inference of the Ising model is typically carried out via a pseudo-likelihood, as the standard likelihood approach suffers from a high computational cost when there are many variables (i.e., items). Unfortunately, the presence of missing values can hinder the use of pseudo-likelihood, and a listwise deletion approach for missing data treatment may introduce a substantial bias into the estimation and sometimes yield misleading interpretations. This paper proposes a conditional Bayesian framework for Ising network analysis with missing data, which integrates a pseudo-likelihood approach with iterative data imputation. An asymptotic theory is established for the method. Furthermore, a computationally efficient Pólya–Gamma data augmentation procedure is proposed to streamline the sampling of model parameters. The method’s performance is shown through simulations and a real-world application to data on major depressive and generalized anxiety disorders from the National Epidemiological Survey on Alcohol and Related Conditions (NESARC).

Keywords

Ising model iterative imputation full conditional specification network psychometrics mental health disorders major depressive disorder generalized anxiety disorder

Type: Theory and Methods
Information: Psychometrika , Volume 89 , Issue 4 , December 2024 , pp. 1186 - 1202

DOI: https://doi.org/10.1007/s11336-024-09985-2 [Opens in a new window]
Creative Commons: This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Copyright: © 2024 The Author(s)

Recent years have witnessed the emergence of network psychometrics (Borsboom, Reference Borsboom2008; Marsman and Rhemtulla, Reference Marsman and Rhemtulla2022; van der Maas et al., Reference van der Maas, Dolan, Grasman, Wicherts, Huizenga and Raijmakers2006), a family of statistical graphical models and related inference procedures, for analyzing and interpreting the dependence structure in psychometric data. These models embed psychometric items as nodes in an undirected or directed network (i.e., graph) and visualize their interrelationships through the network edges, which represent certain probabilistic conditional dependencies. Network psychometric methods concern the learning of the network structure. They have been developed under various settings, including undirected graphical models for cross-sectional data (Burger et al., Reference Burger, Isvoranu, Lunansky, Haslbeck, Epskamp, Hoekstra and Blanken2022; Epskamp et al., Reference Epskamp, Waldorp, Mõttus and Borsboom2018), directed networks for longitudinal data (Borsboom et al., Reference Borsboom, Deserno, Rhemtulla, Epskamp, Fried, McNally and Waldorp2021; Gile and Handcock, Reference Gile and Handcock2017; Ryan et al., Reference Ryan, Bringmann and Schuurman2022), and extended networks with latent variables for time series data or panel data (Epskamp, Reference Epskamp2020). These methods have received wide applications in education (Koponen et al., Reference Koponen, Asikainen, Viholainen and Hirvonen2019; Siew, Reference Siew2020; Simon de Blas et al., 2021; Sweet et al., Reference Sweet, Thomas and Junker2013; Willcox and Huang, Reference Willcox and Huang2017), psychology (Borsboom et al., Reference Borsboom, Deserno, Rhemtulla, Epskamp, Fried, McNally and Waldorp2021; Burgess and Hitch, Reference Burgess and Hitch1999; Epskamp et al., Reference Epskamp, Borsboom and Fried2018; Fried et al., Reference Fried, van Borkulo, Cramer, Boschloo, Schoevers and Borsboom2017; Van Der Maas et al., Reference Van Der Maas, Kan, Marsman and Stevenson2017), and health sciences (Brunson and Laubenbacher, Reference Brunson and Laubenbacher2018; Kohler et al., Reference Kohler, Jankowski, Bashford, Goyal, Habermann and Walker2022; Luke and Harris, Reference Luke and Harris2007; Mkhitaryan et al., Reference Mkhitaryan, Crutzen, Steenaart and de Vries2019).

Analyzing cross-sectional binary item response data with the Ising model (Ising, Reference Ising1925) is common in network psychometric analysis. This analysis is typically performed based on a conditional likelihood (Besag, Reference Besag1974) because the standard likelihood function is computationally infeasible when involving many variables. In this direction, Bayesian and frequentist methods have been developed, where sparsity-inducing priors or penalties are combined with the conditional likelihood for learning a sparse network structure (Epskamp and Fried, Reference Epskamp and Fried2018; Li et al., Reference Li, Craig and Bhadra2019; Marsman et al., Reference Marsman, Huth, Waldorp and Ntzoufras2022; Mazumder and Hastie, Reference Mazumder and Hastie2012; Van Borkulo et al., Reference Van Borkulo, Borsboom, Epskamp, Blanken, Boschloo, Schoevers and Waldorp2014; Yuan and Lin, Reference Yuan and Lin2007). Besides, the Ising model is shown to be closely related to item response theory (IRT) models (Anderson and Yu, Reference Anderson and Yu2007; Holland, Reference Holland1990). The log-multiplicative association models (Anderson and Yu, Reference Anderson and Yu2007), which are special cases of the Ising model, can be used as item response theory models and yield very similar results as IRT models. Furthermore, the Ising model and the conditional likelihood have been used for modeling the local dependence structure in locally dependent IRT models (Chen et al., Reference Chen, Li, Liu and Ying2018; Ip, Reference Ip2002).

Due to its construction, the conditional likelihood does not naturally handle data with missing values, despite the omnipresence of missing data in psychometric data. To deal with missing values in an Ising network analysis, listwise deletion (Fried et al., Reference Fried, Von Stockert, Haslbeck, Lamers, Schoevers and Penninx2020; Haslbeck and Fried, Reference Haslbeck and Fried2017) and single imputation (e.g., Huisman, Reference Huisman2009, Armour et al., Reference Armour, Fried, Deserno, Tsai and Pietrzak2017, Lin et al., Reference Lin, Fried and Eaton2020) are typically performed, which arguably may not be the best practice. In particular, it is well known that listwise deletion is statistically inefficient and requires the missing completely at random (MCAR) assumption (Little and Rubin, 2019) to ensure consistent estimation. Moreover, a naïve imputation procedure, such as mode imputation, likely introduces bias into parameter estimation. A sophisticated imputation procedure must be developed to ensure statistical validity and computational efficiency.

In this note, we propose an iterative procedure for learning an Ising network. The proposed procedure combines iterative imputation via full conditional specification (FCS; Liu et al., Reference Liu, Gelman, Hill, Su and Kropko2014, van Buuren, 2018), and Bayesian estimation of the Ising network. We show that the FCS leads to estimation consistency when the conditional models are chosen to take logistic forms. In terms of computation, we propose a joint Pólya–Gamma augmentation procedure by extending the Pólya–Gamma augmentation procedure for logistic regression (Polson et al., Reference Polson, Scott and Windle2013). It allows us to efficiently sample parameters of the Ising model. Simulations are conducted to compare the proposed procedure with estimations based on the listwise deletion and single imputation. Finally, the proposed procedure and a complete-case analysis are applied to study the network of major depressive disorder (MDD) and generalized anxiety disorders (GAD) based on data from the National Epidemiological Survey on Alcohol and Related Conditions (NESARC; Grant et al., 2003). In this analysis, data missingness is mainly due to two screening items for GAD. That is, a respondent’s responses to the rest of the MDD items are missing if they answered “no” to both screening items. This missing mechanism is Missing at Random (MAR; Little & Rubin, 2019). The complete-case analysis of missing data caused by screening items is known to be problematic in the literature of network psychometrics (Borsboom et al., Reference Borsboom, Fried, Epskamp, Waldorp, van Borkulo, van der Maas and Cramer2017; McBride et al., Reference McBride, van Bezooijen, Aggen, Kendler and Fried2023). Our Bayesian estimate of the edge coefficient between the two screening items is negative based on the complete cases, which can be seen as a result of Berkson’s paradox (De Ron et al., Reference De Ron, Fried and Epskamp2021). In contrast, the proposed method makes use of all the observed data entries and obtains a positive estimate of this edge coefficient. An identifiability result about the Ising model under this special missing data setting in Appendix, the item content, and a simulation study mimicking this setting suggest that the result given by the proposed method is sensible. A GitHub repository for the proposed algorithm is available at https://github.com/slzhang-fd/IsingNetMissing-replication.

1. Proposed Method

1.1. Ising Model

Consider a respondent answering J binary items. Let $Y = {(Y_{1}, \dots, Y_{J})}^{⊤} \in {0, 1}^{J}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{Y}}}= (Y_1,\ldots , Y_J)^\top \in \{0,1\}^J$$\end{document} be a binary random vector representing the respondent’s responses. We say $Y$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{Y}}}$$\end{document} follows an Ising model if its probability mass function satisfies

(1)

\begin{matrix} P (Y = y ∣ S) = \frac{1}{c (S)} exp (\frac{1}{2}, y^{⊤}, S, y) = \frac{1}{c (S)} exp (\sum_{j = 1}^{J} s_{jj} y_{j} / 2 + \sum_{j = 1}^{J - 1} \sum_{k = j + 1}^{J} s_{jk} y_{j} y_{k}), \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} P({{\textbf{Y}}}= {{\varvec{y}}}\mid {{\textbf{S}}}) = \frac{1}{c({{\textbf{S}}})}\exp \left[ \frac{1}{2}{{\varvec{y}}}^\top {{\textbf{S}}}{{\varvec{y}}}\right] =\frac{1}{c({{\textbf{S}}})}\exp \left[ \sum _{j=1}^J s_{jj}y_{j}/2+\sum _{j=1}^{J-1}\sum _{k=j+1}^Js_{jk}y_{j}y_k\right] , \end{aligned}$$\end{document}

where $S = {(s_{ij})}_{J \times J}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}= (s_{ij})_{J\times J}$$\end{document} is a J by J symmetric matrix that contains parameters of the Ising model and

\begin{matrix} c (S) = \sum_{y \in {0, 1}^{J}} exp (\sum_{j = 1}^{J} s_{jj} y_{j} / 2 + \sum_{j = 1}^{J - 1} \sum_{k = j + 1}^{J} s_{jk} y_{j} y_{k}) \end{matrix}

is a normalizing constant. The parameter matrix $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}$$\end{document} encodes a network with the J items being the nodes. More specifically, an edge is present between nodes i and j if and only if the corresponding entry $s_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{ij}$$\end{document} is nonzero. If an edge exists between nodes i and j, then $Y_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i$$\end{document} and $Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_j$$\end{document} are conditionally dependent given the rest of the variables. Otherwise, the two variables are conditionally independent.

In Ising network analysis, the goal is to estimate the parameter matrix $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}$$\end{document} . The standard likelihood function is computationally intensive when J is large, as it requires computing a normalizing constant $c (S)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c({{\textbf{S}}})$$\end{document} which involves a summation of all the $2^{J}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^J$$\end{document} response patterns. To address this computational issue, Besag (Reference Besag1975) proposed a conditional likelihood which is obtained by aggregating the conditional distributions of $Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_j$$\end{document} given $Y_{- j} = {(Y_{1}, \dots, Y_{j - 1}, Y_{j + 1}, \dots, Y_{J})}^{⊤}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{Y}}}_{-j} = (Y_1,\ldots , Y_{j-1}, Y_{j+1},\ldots , Y_J)^\top $$\end{document} , for $j = 1, \dots, J$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,\ldots , J$$\end{document} , where the conditional distribution of $Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_j$$\end{document} given $Y_{- j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{Y}}}_{-j}$$\end{document} takes a logistic regression form. More precisely, the conditional likelihood with one observation $y$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{y}}}$$\end{document} is defined as

(2)

\begin{matrix} p^{*} (y ∣ S) = \prod_{j = 1}^{J} p (y_{j} ∣ y_{- j}, S) = \prod_{j = 1}^{J} \frac{exp ((s_{jj} / 2 + \sum_{k \neq j} s_{jk} y_{k}), y_{j})}{1 + exp (s_{jj} / 2 + \sum_{k \neq j} s_{jk} y_{k})} . \end{matrix}

A disadvantage of the conditional likelihood is that it requires a fully observed dataset because missing values cannot be straightforwardly marginalized out from (2). In what follows, we discuss how missing data can be treated in the conditional likelihood.

1.2. Proposed Method

Consider a dataset with N observations. Let $Ω_{j} \subset {1, \dots, N}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Omega _{j} \subset \{1,\ldots , N\}$$\end{document} be the subset of observations whose data on item j are missing. For each observation i and item j, $y_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{ij}$$\end{document} denotes the observed response if $i \notin Ω_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i \notin \Omega _j$$\end{document} , and otherwise, $y_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{ij}$$\end{document} is missing. Thus, the observed data include $Ω_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Omega _j$$\end{document} and $y_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{ij}$$\end{document} , for $i \in {1, \dots, N} \ Ω_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i\in \{1,\ldots , N\}{\setminus }\Omega _j$$\end{document} and $j = 1, \dots, J$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j = 1,\ldots , J$$\end{document} .

The proposed procedure iterates between two steps—(1) imputing the missing values of $y_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{ij}$$\end{document} for $i \in Ω_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i \in \Omega _j$$\end{document} , $j = 1, \dots, J$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,\ldots , J$$\end{document} , achieved via a full conditional specification, and (2) sampling the posterior distribution of $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}$$\end{document} given the most recently imputed data. Let t be the current iteration number. Further, let $y_{i}^{(t - 1)} = {(y_{i 1}^{(t - 1)}, \dots, y_{iJ}^{(t - 1)})}^{⊤}, i = 1, \dots, N$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{y}}}^{(t-1)}_i = (y_{i1}^{(t-1)},\ldots , y_{iJ}^{(t-1)})^\top , i=1,\ldots , N$$\end{document} , be imputed data from the previous iteration, where $y_{ij}^{(t - 1)} = y_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y^{(t-1)}_{ij} = y_{ij}$$\end{document} for $i \notin Ω_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i \notin \Omega _j$$\end{document} and $y_{ij}^{(t - 1)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y^{(t-1)}_{ij}$$\end{document} is imputed in the $(t - 1)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(t-1)$$\end{document} th iteration for $i \in Ω_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i \in \Omega _j$$\end{document} . For the tth iteration, the imputation and sampling steps are described as follows.

Imputation We initialize the imputation in the tth iteration with the previously imputed data set $(y_{i 1}^{(t - 1)}, \dots, y_{iJ}^{(t - 1)})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(y_{i1}^{(t-1)},\ldots ,y_{iJ}^{(t-1)})$$\end{document} , $i = 1, \dots, N$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i=1,\ldots , N$$\end{document} . Then, we run a loop over all the items, $j = 1, \dots, J$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j = 1,\ldots , J$$\end{document} . In step j of the loop, we impute $y_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{ij}$$\end{document} for all $i \in Ω_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i \in \Omega _j$$\end{document} , given the most recently imputed data $(y_{i 1}^{(t)}, \dots, y_{i, j - 1}^{(t)}, y_{ij}^{(t - 1)}, \dots, y_{iJ}^{(t - 1)})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(y_{i1}^{(t)},\ldots ,y_{i,j-1}^{(t)},y_{ij}^{(t-1)},\ldots ,y_{iJ}^{(t-1)})$$\end{document} , $i = 1, \dots, N$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i = 1,\ldots , N$$\end{document} . We then obtain an updated data set $(y_{i 1}^{(t)}, \dots, y_{i, j}^{(t)}, y_{i, j + 1}^{(t - 1)}, \dots, y_{iJ}^{(t - 1)})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(y_{i1}^{(t)},\ldots ,y_{i,j}^{(t)},y_{i,j+1}^{(t-1)},\ldots ,y_{iJ}^{(t-1)})$$\end{document} by incorporating the newly imputed values for $y_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{ij}$$\end{document} .

The imputation of each variable j is based on the conditional distribution of $Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_j$$\end{document} given $Y_{- j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{Y}}}_{-j}$$\end{document} . Under the Ising model, this conditional distribution takes a logistic regression form. For computational reasons to be discussed in the sequel, we introduce an auxiliary parameter vector $β_{j} = {(β_{j 1}, \dots, β_{jJ})}^{⊤}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\beta }}_j = (\beta _{j1},\ldots , \beta _{jJ})^\top $$\end{document} as coefficients in the logistic regression, instead of directly using $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}$$\end{document} from the previous iteration to sample the missing $y_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{ij}$$\end{document} s. Unlike the constraint of $s_{ij} = s_{ji}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{ij} = s_{ji}$$\end{document} in the symmetric matrix $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}$$\end{document} , no constraints are imposed on $β_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta }_j$$\end{document} , $j = 1, \dots, J$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,\ldots , J$$\end{document} , which makes the sampling computationally efficient; see discussions in Sect. 1.4. The imputation of variable j consists of the following two steps:

1. Sample auxiliary parameter vector $β_{j}^{(t)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\beta }}_j^{(t)}$$\end{document} from the posterior distribution
(3) $\begin{matrix} p^{(t, j)} (β_{j}) \propto π_{j} (β_{j}) \prod_{i = 1}^{N} \frac{exp ((β_{jj} / 2 + \sum_{k \neq j} β_{jk} y_{ik}^{(t, j - 1)}), y_{ij}^{(t, j - 1)})}{1 + exp (β_{jj} / 2 + \sum_{k \neq j} β_{jk} y_{ik}^{(t, j - 1)})}, \end{matrix}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p^{(t,j)}({\varvec{\beta }}_j) \propto \pi _j({\varvec{\beta }}_j)\prod _{i=1}^N \frac{\exp \left[ (\beta _{jj}/2+\sum _{k\ne j}\beta _{jk}y_{ik}^{(t,j-1)})y_{ij}^{(t,j-1)}\right] }{1+\exp (\beta _{jj}/2+\sum _{k\ne j}\beta _{jk}y_{ik}^{(t,j-1)})}, \end{aligned}$$\end{document}
where $π_{j} (β_{j})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _j({\varvec{\beta }}_j)$$\end{document} is the prior distribution for the auxiliary parameters $β_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\beta }}_j$$\end{document} .
2. Sample $y_{ij}^{(t)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{ij}^{(t)}$$\end{document} for each $i \in Ω_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i \in \Omega _j$$\end{document} from a Bernoulli distribution with success probability
(4) $\begin{matrix} \frac{exp (β_{jj}^{(t)} / 2 + \sum_{k \neq j} β_{jk}^{(t)} y_{ik}^{(t, j - 1)})}{1 + exp (β_{jj}^{(t)} / 2 + \sum_{k \neq j} β_{jk}^{(t)} y_{ik}^{(t, j - 1)})} . \end{matrix}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{\exp (\beta _{jj}^{(t)}/2+\sum _{k\ne j}\beta _{jk}^{(t)}y_{ik}^{(t,j-1)})}{1+\exp (\beta _{jj}^{(t)}/2+\sum _{k\ne j}\beta _{jk}^{(t)}y_{ik}^{(t,j-1)})}. \end{aligned}$$\end{document}

After these two steps, we obtain $(y_{i 1}^{(t)}, \dots, y_{i, j}^{(t)}, y_{i, j + 1}^{(t - 1)}, \dots, y_{iJ}^{(t - 1)})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(y_{i1}^{(t)},\ldots ,y_{i,j}^{(t)},y_{i,j+1}^{(t-1)},\ldots ,y_{iJ}^{(t-1)})$$\end{document} by incorporating the newly imputed values for $y_{ij}, i \in Ω_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{ij}, i\in \Omega _j$$\end{document} . We emphasize that only the missing values are updated. For $i \notin Ω_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i \notin \Omega _j$$\end{document} , $y_{ij}^{(t)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{ij}^{(t)}$$\end{document} is always the observed value of $y_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{ij}$$\end{document} . After the loop over all the items, we have the imputed data set $(y_{i 1}^{(t)}, \dots, y_{iJ}^{(t)})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(y_{i1}^{(t)},\ldots ,y_{iJ}^{(t)})$$\end{document} as the output from this imputation step.

(5)

\begin{matrix} p (S ∣ y_{1}^{(t)}, \dots, y_{N}^{(t)}) \propto π (S) \prod_{i = 1}^{N} p^{*} (y_{i}^{(t)} ∣ S), \end{matrix}

where $π (S)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi ({{\textbf{S}}})$$\end{document} is the prior distribution for the Ising parameter matrix $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}$$\end{document} and recall that $\prod_{i = 1}^{N} p^{*} (y_{i}^{(t)} ∣ S)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\prod _{i=1}^N p^*({{\varvec{y}}}_i^{(t)} \mid {{\textbf{S}}})$$\end{document} is the conditional likelihood.

Figure 1 Flowchart of the updating rule for the proposed method.

Algorithm 1 Ising Network Analysis with Iterative Imputation

Figure 1 visualizes the steps performed in the proposed method. Note that it is unnecessary to sample the parameter matrix $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}$$\end{document} during the burn-in period and in every iteration after the burn-in period; thus, we employ a thinning step after the burn-in period. This is done to both decrease computational cost and reduce the auto-correlation in the imputed data. Moreover, we outline the proposed algorithm in Algorithm 1. The final estimate of $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}$$\end{document} is obtained by averaging all the $S^{(t)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}^{(t)}$$\end{document} obtained after the burn-in period. The computational details, including the sampling of auxiliary parameters and Ising parameter matrix and discussions of the computational complexity, are given in Sect. 1.4.

We remark that our method imputes the missing variables one by one for each observation. This method is chosen because simultaneously imputing all the missing variables is typically computationally infeasible, especially when some observation units have many missing values. Simultaneous imputation requires evaluating the joint distribution of the missing variables given the observed ones, whose computational complexity grows exponentially with the number of missing values. In contrast, the proposed method is based on unidimensional conditional distributions, which is computationally more feasible. We also note that the proposed method has several variants that should also work well. These variants are discussed in Sect. 3.

1.3. Statistical Consistency

As our method is not a standard Bayesian inference procedure, we provide an asymptotic theory under the frequentist setting to justify its validity. In particular, we show that the $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}$$\end{document} parameter sampled from the pseudo-posterior distribution converges to the true parameter $S_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}_0$$\end{document} , under the assumptions that the Ising model is correctly specified and the data are MAR.

Consider one observation with a complete data vector $Y = {(Y_{1}, \dots, Y_{J})}^{⊤}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{Y}}}= (Y_1,\ldots , Y_J)^\top $$\end{document} . Further, let $Z = {(Z_{1}, \dots, Z_{J})}^{⊤}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{Z}}}= (Z_1,\ldots , Z_J)^\top $$\end{document} be a vector of missing indicators, where $Z_{ij} = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_{ij} = 1$$\end{document} if $Y_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{ij}$$\end{document} is observed and $Z_{ij} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_{ij} = 0$$\end{document} otherwise. We further let $Y_{obs} = {Y_{j} : Z_{j} = 1, j = 1, \dots, J}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{Y}}}_{obs} = \{Y_j: Z_j = 1, j = 1,\ldots , J\}$$\end{document} and $Y_{mis} = {Y_{j} : Z_{j} = 0, j = 1, \dots, J}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{Y}}}_{mis} = \{Y_j: Z_j = 0, j = 1,\ldots , J\}$$\end{document} be the observed and missing entries of $Y$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{Y}}}$$\end{document} , respectively. Consider the joint distribution of observable data $(Y_{obs}, Z)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({{\textbf{Y}}}_{obs}, {{\textbf{Z}}})$$\end{document} , taking the form

(6)

\begin{matrix} P (Y_{obs} = y_{obs}, Z = z ∣ S, ϕ) = \sum_{y_{j} : z_{j} = 0} (exp (y^{⊤} S y / 2) / c (S)) q (z ∣ y, ϕ), \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} P({{\textbf{Y}}}_{obs} = {{\varvec{y}}}_{obs}, {{\textbf{Z}}}= {\textbf{z}}\mid {{\textbf{S}}}, \varvec{\phi }) = \sum _{y_j:z_j=0} \left( \exp \left( {{\varvec{y}}}^\top {{\textbf{S}}}{{\varvec{y}}}/2\right) /{c({{\textbf{S}}})} \right) q({\textbf{z}} \mid {{\varvec{y}}},\varvec{\phi }), \end{aligned}$$\end{document}

where $exp (y^{⊤} S y / 2) / c (S)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\exp \left( {{\varvec{y}}}^\top {{\textbf{S}}}{{\varvec{y}}}/2\right) /{c({{\textbf{S}}})}$$\end{document} is the distribution of $Y = y$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{Y}}}= {{\varvec{y}}}$$\end{document} under the Ising model, $q (z ∣ y, ϕ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q({\textbf{z}} \mid {{\varvec{y}}},\varvec{\phi })$$\end{document} denotes the conditional probability of $Z = z$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{Z}}}= {\textbf{z}}$$\end{document} given $Y = y$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{Y}}}= {{\varvec{y}}}$$\end{document} , and $ϕ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\phi }$$\end{document} denotes the unknown parameters of this distribution. The MAR assumption, also known as the ignorable missingness assumption, means that the conditional distribution $q (z ∣ y, ϕ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q({\textbf{z}} \mid {{\varvec{y}}},\varvec{\phi })$$\end{document} depends on $y$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{y}}}$$\end{document} only through the observed entries, i.e., $q (z ∣ y, ϕ) = q (z ∣ y_{obs}, ϕ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q({\textbf{z}} \mid {{\varvec{y}}},\varvec{\phi }) = q(\textbf{z}\mid {{\varvec{y}}}_{obs},\varvec{\phi })$$\end{document} . In that case, (6) can be factorized as

(7)

\begin{matrix} P (Y_{obs} = y_{obs}, Z = z ∣ S, ϕ) = q (z ∣ y_{obs}, ϕ) \times (\sum_{y_{j} : z_{j} = 0} exp (y^{⊤} S y / 2) / c (S)) . \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} P({{\textbf{Y}}}_{obs} = {{\varvec{y}}}_{obs}, {{\textbf{Z}}}= {\textbf{z}}\mid {{\textbf{S}}}, \varvec{\phi }) = q({\textbf{z}} \mid {{\varvec{y}}}_{obs},\varvec{\phi }) \times \left( \sum _{y_j:z_j=0} \exp \left( {{\varvec{y}}}^\top {{\textbf{S}}}{{\varvec{y}}}/2\right) /{c({{\textbf{S}}})} \right) . \end{aligned}$$\end{document}

Consequently, the inference of $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}$$\end{document} does not depend on the unknown distribution $q (z ∣ y, ϕ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q({\textbf{z}} \mid {{\varvec{y}}},\varvec{\phi })$$\end{document} .

As shown in Liu et al. (Reference Liu, Gelman, Hill, Su and Kropko2014), the MAR assumption, together with additional regularity conditions, ensures that the iterative imputation of the missing responses converges to the imputation distribution under a standard Bayesian procedure as the number of iterations and the sample size N go to infinity. A key to this convergence result is the compatibility of the conditional models in the imputation step—the logistic regression models are compatible with the Ising model as a joint distribution. The validity of the imputed samples further ensures the consistency of the estimated Ising parameter matrix. We summarize this result in Theorem 1.

Theorem 1

Assume the following assumptions hold: (1) The Markov chain for missing data, generated by the iterative imputation algorithm (Algorithm 1), is positive Harris recurrent and thus admits a unique stationary distribution; (2) the missing data process is ignorable; and (3) A regularity condition holds for prior distributions of Ising model parameters and auxiliary parameters, as detailed in the supplementary material. Let $π_{N}^{*} (S)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _N^*({{\textbf{S}}})$$\end{document} be the posterior density of $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}$$\end{document} implied by the stationary distribution of the proposed method. Given the true parameters $S_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}_0$$\end{document} for the Ising model and any $ε > 0,$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon >0,$$\end{document} we have $π_{N}^{*} (S)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _N^*({{\textbf{S}}})$$\end{document} concentrates at $S_{0},$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}_0,$$\end{document}

(8)

\begin{matrix} \int_{B_{ε} (S_{0})} π_{N}^{*} (S) d S \to 1, \end{matrix}

in probability as $N \to \infty$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N\rightarrow \infty $$\end{document} . $B_{ε} (S_{0}) = {S : ‖ S - S_{0} ‖ < ε}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B_\varepsilon ({{\textbf{S}}}_0)=\{{{\textbf{S}}}:\Vert {{\textbf{S}}}-{{\textbf{S}}}_0\Vert <\varepsilon \}$$\end{document} is the open ball of radius $ε$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document} at $S_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}_0$$\end{document} .

We provide intuitions about this consistency result. Suppose that the data are generated by an Ising model. The iterative imputation method ensures that the parameters of the logistic regressions are close to those implied by the true Ising model, and thus, the conditional distributions we use to impute the missing values are close to those under the true model. This further guarantees that the joint distribution of the imputed data given the observed ones is close to that under the true Ising model, and consequently, the Ising model parameters we learn from the imputed data are close to those of the true model.

1.4. Computational Details

In what follows, we discuss the specification of the prior distributions and the sampling of auxiliary parameters $β_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta }_j$$\end{document} and Ising model parameters $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}$$\end{document} .

Sampling $β_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta }_j$$\end{document} We set independent mean-zero normal priors for entries of $β_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta }_j$$\end{document} . For the intercept parameter $β_{jj}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{jj}$$\end{document} , we use a weakly informative prior by setting the variance to 100. For the slope parameters $β_{jk}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{jk}$$\end{document} , $k \neq j$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k\ne j$$\end{document} , we set a more informative prior by setting the variance to be 1, given that these parameters correspond to the off-diagonal entries of $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}$$\end{document} , which are sparse and typically do not take extreme values. The sampling of the auxiliary parameters $β_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\beta }}_j$$\end{document} , following (3), is essentially a standard Bayesian logistic regression problem. We achieve it by a Markov chain Monte Carlo (MCMC) sampler called the Pólya–Gamma sampler (Polson et al., Reference Polson, Scott and Windle2013).

To obtain $β_{j}^{(t)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta }^{(t)}_j$$\end{document} , this sampler starts with $β_{j}^{(t - 1)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta }^{(t-1)}_j$$\end{document} from the previous step. It constructs an MCMC transition kernel by a data argumentation trick. More precisely, the following two steps are performed.

1. Given $β^{(t - 1)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta }^{(t-1)}$$\end{document} , independently sample N augmentation variables, each from a Pólya–Gamma distribution (Barndorff-Nielsen et al., Reference Barndorff-Nielsen, Kent and Sørensen1982).
2. Given the N augmentation variables, sample $β^{(t)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta }^{(t)}$$\end{document} from a J-variate normal distribution.

The details of these two steps are given in the supplementary material, including the forms of the Pólya–Gamma distributions and the mean and covariance matrix of the J-variate normal distribution. We choose the Pólya–Gamma sampler because it is very easy to construct and computationally efficient. It is much easier to implement than Metropolis–Hastings samplers which often need tuning to achieve good performance.

We comment on the computational complexity of the sampling of $β_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta }_j$$\end{document} . The sampling from the Pólya–Gamma distribution has a complexity O(NJ), and the sampling from the J-variate normal distribution has a complexity of $O (N J^{2}) + O (J^{3})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(NJ^2)+O(J^3)$$\end{document} . Consequently, a loop of all the $β_{j}, j = 1, \dots, J$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\beta }}_j, j=1,\ldots , J$$\end{document} , has a complexity of $O ((N + J) J^{3})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O((N+J)J^3)$$\end{document} .

Sampling $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}$$\end{document} Since $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}$$\end{document} is a symmetric matrix, we reparametrize it by vectorizing its off-diagonal entries (including the diagonal entries). Specifically, the reparameterization is done by half-vectorization, denoted by $α = vech (S) = {(s_{11}, \dots, s_{J 1}, s_{22}, \dots, s_{J 2}, \dots, s_{JJ})}^{⊤} \in R^{J (J + 1) / 2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\alpha }}= \text {vech}({{\textbf{S}}}) = (s_{11},\ldots , s_{J1}, s_{22},\ldots , s_{J2},\ldots , s_{JJ})^\top \in {\mathbb {R}}^{J(J+1)/2}$$\end{document} . It is easy to see that $vech (\cdot)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {vech}(\cdot )$$\end{document} is a one-to-one mapping between $R^{J (J + 1) / 2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {R}}^{J(J+1)/2}$$\end{document} and $J \times J$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J\times J$$\end{document} symmetric matrices. Therefore, we impose a prior distribution on $α$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\alpha }}$$\end{document} and sample $α^{(t)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\alpha }}^{(t)}$$\end{document} in the tth iteration when $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}$$\end{document} is sampled. Then we let $S^{(t)} = {vech}^{- 1} (α^{(t)})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}^{(t)} = \text {vech}^{-1}({\varvec{\alpha }}^{(t)})$$\end{document} .

Recall that a thinning step is performed, and $t_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t_0$$\end{document} is the gap between two samples of $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}$$\end{document} . Let t be a multiple of $t_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t_0$$\end{document} and $α^{(t - t_{0})} = vech (S^{(t - t_{0})})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\alpha }}^{(t-t_0)} = \text {vech}({{\textbf{S}}}^{(t-t_0)})$$\end{document} be previous value of $α$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\alpha }}$$\end{document} . The sampling of $α^{(t)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\alpha }}^{(t)}$$\end{document} is also achieved by a Pólya–Gamma sampler, which involves the following two steps similar to the sampling of $β_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta }_j$$\end{document} .

1. Given $α^{(t - t_{0})}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\alpha }}^{(t-t_0)}$$\end{document} , independently sample NJ augmentation variables, each from a Pólya–Gamma distribution.
2. Given the NJ augmentation variables, sample $α^{(t)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\alpha }}^{(t)}$$\end{document} from a $J (J + 1) / 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J(J+1)/2$$\end{document} -variate normal distribution.

The details of these two steps are given in the supplementary material. We note that the computational complexity of sampling the NJ augmentation variables is $O (N J^{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(NJ^2)$$\end{document} , and that of sampling $α^{(t)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\alpha }}^{(t)}$$\end{document} is $O (N J^{5}) + O (J^{6})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(NJ^5)+O(J^6)$$\end{document} , resulting in an overall complexity $O ((N + J) J^{5})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O((N+J)J^5)$$\end{document} . Comparing the complexities of the imputation and sampling $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}$$\end{document} steps, we notice that the latter is computationally much more intensive. This is the reason why we choose to impute data by introducing auxiliary parameters $β_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta _j}$$\end{document} s rather than using Ising network parameters $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}$$\end{document} so that the iterative imputation mixes much faster in terms of the computation time. In addition, we only sample $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}$$\end{document} every $t_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t_0$$\end{document} iterations for a reasonably large $t_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t_0$$\end{document} to avoid a high computational cost and also reduce the auto-correlation between the imputed data.

We remark that Marsman et al. (Reference Marsman, Huth, Waldorp and Ntzoufras2022) considered a similar Ising network analysis problem based on fully observed data, in which they proposed a Bayesian inference approach based on a spike-and-slab prior to learning $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}$$\end{document} . Their Bayesian inference is also based on a Pólya–Gamma sampler. However, they combined Gibbs sampling with a Pólya–Gamma sampler, updating one parameter in $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}$$\end{document} at a time. This Gibbs scheme often mixes slower than the joint update of $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}$$\end{document} as in the proposed method and, thus, is computationally less efficient. The proposed Pólya–Gamma sampler may be integrated into the framework of Marsman et al. (Reference Marsman, Huth, Waldorp and Ntzoufras2022) to improve their computational efficiency.

2. Numerical Experiments

We illustrate the proposed method and show its power via simulation studies and a real-world data application. In Sect. 2.1, we conduct two simulation studies, evaluating the proposed method under two MAR scenarios, one of which involves missingness due to screening items. A further simulation study is carried out, applying our method to a 15-node Ising model governed by the MCAR mechanism. Detailed exposition of this study can be found in the supplementary materials.

2.1. Simulation

Study I We generate data from an Ising model with $J = 6$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J=6$$\end{document} variables. Missing values are generated under an MAR setting that is not MCAR. The proposed method is then compared with Bayesian inference based on (1) listwise deletion and (2) a single imputation, where the single imputation is based on the imputed data from the Tth iteration of Algorithm 1, recalling that $T_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_0$$\end{document} is the burn-in size.

We configure the true parameter matrix $S_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}_0$$\end{document} as follows. Since $S_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}_0$$\end{document} is a symmetric matrix, we only need to specify its upper triangular matrix and then the diagonal entries. For the upper triangular entries (i.e., $s_{jl}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{jl}$$\end{document} , $j < l$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j<l$$\end{document} ), we randomly assign 50% of them to zero to introduce a moderately sparse setting. In addition, the nonzero parameters are then generated by sampling from a uniform distribution over the set $[- 1, - 0.4] \cup [0.4, 1]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[-1, -0.4] \cup [0.4, 1]$$\end{document} . The intercept parameters $s_{jj}, j = 1, \dots, J$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{jj},j=1,\ldots , J$$\end{document} are set to zero. The true parameter values are given in the supplementary material. Missing data are simulated by masking particular elements under an MAR mechanism. In particular, we have $z_{i 6} = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_{i6}=1$$\end{document} , so that the sixth variable is always observed. We further allow the missingness probabilities of the first five variables (i.e., $z_{ij} = 0, j = 1, \dots, 5$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_{ij}=0,j=1,\ldots ,5$$\end{document} ) to depend on the values of $y_{i 6}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{i6}$$\end{document} . The specific settings on $p (z_{ij} = 0 ∣ y_{i 6}), j = 1, \dots, 5$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(z_{ij}=0\mid y_{i6}),j=1,\ldots , 5$$\end{document} are detailed in the supplementary material. Data are generated following the aforementioned Ising model and MAR mechanism for four different sample sizes, $N = 1000, 2000, 4000$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N = 1000, 2000, 4000$$\end{document} , and 8000, respectively. For each sample size, 50 independent replications are created.

Three methods are compared—the proposed method, Bayesian inference with a single imputation, and Bayesian inference based on complete cases from listwise deletion. The Bayesian inference for complete data is performed by sampling parameters from the posterior implied by the pseudo-likelihood and a normal prior, which is a special case of the proposed method without iterative imputation steps. All these methods shared the same initial values $s_{jl}^{(0)} \sim U (- 0.1, 0.1), 1 \leq j \leq l \leq J$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{jl}^{(0)}\sim U(-0.1,0.1),1\le j\le l\le J$$\end{document} . For our proposed method, we set the length of the Markov chain Monte Carlo (MCMC) iterations to $T = 5000$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T = 5000$$\end{document} and a burn-in size of $T_{0} = 1000$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_0 = 1000$$\end{document} , with a thinning parameter $k_{0} = 10$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_0=10$$\end{document} . This setup leads to an effective total of 400 MCMC samples for the Ising parameter matrix $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}$$\end{document} . Notably, identical MCMC length and burn-in configuration are applied during parameters inference in the single imputation and complete-case analyses.

Figure 2 gives the plots for the mean squared errors (MSE) of the estimated edge parameters and intercept parameters under different sample sizes and for different methods. The MSE for each parameter $s_{jl}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{jl}$$\end{document} is defined as

(9)

\begin{matrix} \frac{1}{50} \sum_{k = 1}^{50} {({\hat{s}}_{k, j l} - s_{0, j l})}^{2} . \end{matrix}

Here, ${\hat{s}}_{k, j l}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{s}_{k,jl}$$\end{document} denotes the estimated value from the kth replication while $s_{0, j l}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{0,jl}$$\end{document} refers to the true value. Each box in panel (a) corresponds to the 15 edge parameters, and each box in panel (b) corresponds to the 6 intercept parameters. We notice that the listwise deletion procedure introduces biases into the edge and intercept estimation, resulting in the MSEs for certain parameters not decaying toward zero as the sample size grows. Additionally, both the proposed method and the single imputation method offer accurate parameter estimation, with MSEs decaying toward zero as the sample size increases. Notably, the proposed method is substantially more accurate than the single imputation method, suggesting that aggregating over multiple imputed datasets improves the estimation accuracy. Furthermore, for smaller sample sizes, the complete-case analysis seems to yield slightly more accurate estimates of the edge parameters than the single imputation method. Across four sample sizes, the median computational times for obtaining the results of the proposed method were 33, 50, 88, and 185 s, respectively.Footnote 1

Figure 2 a Boxplots of MSEs for edge parameters $s_{jl}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{jl}$$\end{document} . b Boxplots of MSEs for intercept parameters $s_{jj}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{jj}$$\end{document} .

Study II: Missing due to screening items Missingness due to screening items is commonly encountered in practice, posing challenges to the network analysis (Borsboom et al., Reference Borsboom, Fried, Epskamp, Waldorp, van Borkulo, van der Maas and Cramer2017; McBride et al., Reference McBride, van Bezooijen, Aggen, Kendler and Fried2023). This occurs, for example, in surveys where initial screening questions determine respondents’ eligibility or relevance for subsequent questions. Suppose respondents indicate a lack of relevant experience (i.e., their answers to the screening items are all negative). In that case, they are not prompted to answer certain follow-up questions, making the missingness of these responses depend on their answers to the screening questions and, thus, MAR. Our real data example in Sect. 2.2 involves two screening items, which results in a large proportion of missing data.

Table 1 MSEs and biases for edge parameters.

We consider a simulation setting involving two screening items to evaluate the proposed method’s performance under this setting. Similar to Study I, we consider a setting with six items, the first two of which are the screening items. The full data are generated under an Ising model, whose parameters are given in the supplementary material, where the corresponding network has six positive edges including one between the two screening items. The responses to the screening items are always set as observed for any observation. When an observation’s responses to the screening items are both zero, their responses to the rest of the four items are regarded as missing.

We consider a single sample size $N = 8000$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=8000$$\end{document} and generate 50 independent datasets. We apply the proposed method, the single imputation method, and the complete-case analysis. For each estimation procedure, we set the MCMC iterations $T = 5000$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T = 5000$$\end{document} , the burn-in size $T_{0} = 1000$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_0 = 1000$$\end{document} , and the thinning parameter $k_{0} = 10$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_0=10$$\end{document} . These methods are compared in terms of MSEs and biases for parameter estimation.

Table 1 presents the result. For all the edge parameters except for $s_{12}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{12}$$\end{document} , the three estimation methods work well, though the single imputation method is slightly less accurate, as indicated by its slightly larger MSEs. However, the complete-case estimate is substantially negatively biased for $s_{12}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{12}$$\end{document} , the edge between two screening items. At the same time, the imputation-based methods are still accurate, with the proposed method having a smaller MSE than that of the single imputation method. This result confirms that running a complete-case analysis on data involving screening items is problematic while performing the imputation-based methods, especially the proposed method, yields valid results.

We provide discussions on this result. The negative bias for $s_{12}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{12}$$\end{document} in the complete-case analysis is due to a selection bias, typically referred to as Berkson’s paradox (De Ron et al., Reference De Ron, Fried and Epskamp2021). The complete-case analysis excludes all the response vectors with negative responses to both screening items. Consequently, a positive response on one screening item strongly suggests a negative response on the other, regardless of the responses to the rest of the items. This results in a falsely negative conditional association between the two screening items. In fact, one can theoretically show that the frequentist estimate of $s_{12}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{12}$$\end{document} based on the maximum pseudo-likelihood is negative infinity. The finite parameter estimate in Table 1 for $s_{12}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{12}$$\end{document} is due to the shrinkage effect of the prior distribution that we impose. On the other hand, the proposed method makes use of the observed frequency of the (0, 0) response pattern for the two screening items, in addition to the frequencies of the fully observed response vectors. As shown by the identifiability result in Appendix, these frequencies are sufficient for identifying all the parameters of the Ising model.

2.2. A Real Data Application

We analyze the dataset for the 2001–2002 National Epidemiological Survey of Alcohol and Related Conditions (NESARC), which offers valuable insights into alcohol consumption and associated issues in the US population (Grant et al., 2003). The dataset consists of 43,093 civilian non-institutionalized individuals aged 18 and older. In this analysis, we focus on two specific sections of the survey that concern two highly prevalent mental health disorders—major depressive disorder (MDD) and generalized anxiety disorder (GAD). Because MDD and GAD have high symptom overlap (Hettema, Reference Hettema2008) and often co-occur (Hasin et al., Reference Hasin, Goodwin, Stinson and Grant2005), it is important to perform a joint analysis of the symptoms of the two mental health disorders and study their separation. In particular, Blanco et al. (Reference Blanco, Rubio, Wall, Secades-Villa, Beesdo-Baum and Wang2014) performed factor analysis based on the same data and found that the two mental health disorders have distinct latent structures. We reanalyze the data, hoping to gain some insights from the network perspective of the two mental health disorders.

Table 2 Descriptions of MDD and GAD items and their missing rates.

Following Blanco et al. (Reference Blanco, Rubio, Wall, Secades-Villa, Beesdo-Baum and Wang2014), we consider data with nine items measuring MDD and six items measuring GAD. These items are designed according to the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) (American Psychiatric Association, 2000). These items ask the participants if they have recently experienced certain symptoms; see Table 2 for their short descriptions. After eliminating samples with entirely absent values across the 15 items, a total of 42,230 cases remain in the dataset. Note that any “Unknown” responses in the original data are converted into missing values. The dataset exhibits a significant degree of missingness, with only 2,412 complete cases for the 15 items, representing approximately 6% of the total cases. Specifically, the missing rate for each item is given in Table 2. Importantly, items D1 and D2 function as screening items and, thus, have a very low missing rate. The respondents did not need to answer items D3–D9 if the responses to D1 and D2 were “No” or “Unknown,” resulting in high missing rates for these items. This pattern suggests that the missing data in this study is not MCAR. The GAD items A1–A6 also have a screening item, which results in the high missing rates in A1 through A6. Following the treatment in Blanco et al. (Reference Blanco, Rubio, Wall, Secades-Villa, Beesdo-Baum and Wang2014), these screening items are not included in the current analysis.

Figure 3 Estimated network structure for MDD and GAD. a Complete-case analysis. b Proposed method.

We apply the proposed method and the complete-case analysis to the data. For each method, 10 MCMC chains with random starting values are used, each having 10,000 MCMC iterations and a burn-in size 5000. The Gelman–Rubin statistics are always below 1.018, confirming the satisfactory convergence of all 120 parameters for both methods. The estimated network structures for MDD and GAD items are presented in Fig. 3, where an edge is shown between two variables when the absolute value of the estimated parameter is greater than 0.5. We emphasize that this threshold is applied only for visualization purposes, rather than for edge selection. Consequently, the edges in Fig. 3 should only be interpreted as edges with large estimated parameters, rather than truly nonzero edges. The nine MDD items are shown as blue nodes at the bottom, and the six GAD items are shown as orange nodes at the top. The edges are colored blue and orange, which represent positive and negative parameter estimates, respectively. In addition, the line thickness of the edges indicates their magnitude. A clear difference between the two methods is the edge between D1 “depressed mood most of the day, nearly every day,” and D2 “markedly diminished interest or pleasure in all, or almost all, activities most of the day, nearly every day,” which are two screening questions in the survey that all the participants responded to. The estimated parameter for this edge has a large absolute value under each of the two methods, but the estimated parameter is negative in the complete-case analysis, while it is positive according to the proposed method. As revealed by the result of Study II in Sect. 2.1, the negative edge estimate of the edge between the screening items given by the complete-case analysis is spurious. Considering the content of these items, we believe that the estimate from the proposed method is more sensible. Other than this edge, the remaining structure of the two networks tends to be similar, but with some differences. In particular, we see that the complete-case analysis yields more edges than the proposed method; for example, the edges of A4–A5, A1–D5, D1–D6, D1–D7, D1–D8, and D8–D9 appear in the estimated network from the complete-case analysis but not in that of the proposed method, while only two edges, A3–A5 and D3–D4, are present in the network estimated by the proposed method but absent in the network from the complete-case analysis. We believe this is due to the higher estimation variance of the complete-case analysis caused by its relatively small sample size.

Finally, our analysis shows that the symptoms of each mental health disorder tend to densely connect with each other in the Ising network, while the symptoms are only loosely but positively connected between the two mental health disorders. The edges between the two mental health disorders identify the overlapping symptoms, including “D4: Insomnia or hypersomnia” and “A6: Sleep disturbance,” “A2: Easily fatigued” and “D6: Fatigue/loss of energy,” and “A3: Difficulty concentrating” and “D8: Diminished concentration.” These results suggest that MDD and GAD are two well-separated mental health disorders, despite their high symptom overlap and frequent co-occurrence. This result confirms the conclusion of Blanco et al. (Reference Blanco, Rubio, Wall, Secades-Villa, Beesdo-Baum and Wang2014) that GAD and MDD are closely related but different nosological entities.

3. Concluding remarks

In this paper, we propose a new method for Ising network analysis in the presence of missing data. The proposed method integrates iterative imputation into a Bayesian inference procedure based on conditional likelihood. An asymptotic theory is established that guarantees the consistency of the proposed estimator. Furthermore, a Pólya–Gamma machinery is proposed for the sampling of Ising model parameters, which yields efficient computation. The power of the proposed method is further shown via simulations and a real data application. An R package has been developed that will be made publicly available upon the acceptance of the paper.

The current work has several limitations that require future theoretical and methodological developments. First, this manuscript concentrates on parameter estimation for the Ising model in the presence of missing data. However, the problem of edge selection (Borsboom, Reference Borsboom2022; Marsman et al., Reference Marsman, Huth, Waldorp and Ntzoufras2022; Noghrehchi et al., Reference Noghrehchi, Stoklosa, Penev and Warton2021; Roçkovóa, Reference Roçkovóa2018) requires future investigation. There are several possible directions. One direction is to view it as a multiple testing problem and develop procedures that control certain familywise error rates or the false discovery rate for the selection of edges. To do so, one needs to develop a way to quantify the uncertainty for the proposed estimator. It is non-trivial, as the proposed method is not a standard Bayesian procedure, and we still lack a theoretical understanding of the asymptotic distribution of the proposed procedure. In particular, it is unclear whether the Bernstein–von Mises theorem that connects Bayesian and frequentist estimation holds under the current setting. Another direction is to view it as a model selection problem. In this direction, we can use sparsity-inducing priors to better explore the Ising network structure when it is sparse. We believe that the proposed method, including the iterative imputation and the Pólya–Gamma machinery, can be adapted when we replace the normal prior with the spike-and-slab prior considered in Marsman et al. (Reference Marsman, Huth, Waldorp and Ntzoufras2022). This adaptation can be done by adding some Gibbs sampling steps. In addition, it is of interest to develop an information criterion that is computationally efficient while statistically consistent. This may be achieved by computing an information criterion, such as the Bayesian information criterion, for each imputed dataset and then aggregating them across multiple imputations. Finally, the proposed method has several variants that may be useful for problems of different scales. For problems of a relatively small scale (i.e., when J is small), we may perform data imputation using the sampled $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{S}}$$\end{document} instead of using auxiliary parameters $β_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\beta }}_j$$\end{document} s. This choice will make the algorithm computationally more intensive, as the sampling of $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{S}}$$\end{document} has a high computational complexity. On the other hand, it may make the estimator statistically more efficient as it avoids estimating the auxiliary parameters $β_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\beta }}_j$$\end{document} s, whose dimension is higher than $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{S}}$$\end{document} . For very large-scale problems, one may estimate the Ising model parameters based only on the auxiliary parameters $β_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\beta }}_j$$\end{document} s. For example, we may estimate $s_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{ij}$$\end{document} by averaging the value of $(β_{ij} + β_{ji}) / 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\beta _{ij}+\beta _{ji})/2$$\end{document} over the iterations. This estimator is computationally more efficient than the proposed one, as it avoids sampling $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{S}}$$\end{document} given the imputed datasets. This estimator should still be consistent but may be statistically slightly less efficient than the proposed one.

Acknowledgements

We thank the editor, associate editor, and three anonymous referees for their careful review and valuable comments. Siliang Zhang’s research is partially supported by Shanghai Science and Technology Committee Rising-Star Program (22YF1411100) and National Natural Science Foundation of China (12301373).

Appendix: Identifiability of Ising Model with Two Screening Items

We investigate the identifiability of the Ising model parameters when there are two screening items.

Proposition 1

Consider an Ising model with $J \geq 3$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J \ge 3$$\end{document} , true parameters $S_{0} = {(s_{ij}^{0})}_{J \times J}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}_0 = (s_{ij}^0)_{J\times J}$$\end{document} , and the first two items being the screening items. We define $p_{0} (y) : = P (Y = y | S_{0}),$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_0 ({\textbf{y}}):= P({\textbf{Y}} = \textbf{y}|{{\textbf{S}}}_0), $$\end{document} for any $y \in A = {{(x_{1}, \dots, x_{J})}^{⊤} \in {0, 1}^{J} : x_{1} = 1 or x_{2} = 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{y}} \in {\mathcal {A}} = \{(x_1,\ldots , x_J)^\top \in \{0,1\}^J:x_1 = 1 \text{ or } x_2=1 \}$$\end{document} , and $p_{0} (0, 0) : = P (Y_{1} = 0, Y_{2} = 0 | S_{0})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_0(0,0):= P(Y_1 = 0, Y_2 =0|{{\textbf{S}}}_0)$$\end{document} under the Ising model. Then there does not exist an Ising parameter matrix $S \neq S_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}\ne {{\textbf{S}}}_0$$\end{document} such that $p_{0} (y) = P (Y = y | S),$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_0 ({\textbf{y}}) = P({\textbf{Y}} = {\textbf{y}}|{{\textbf{S}}}), $$\end{document} for any $y \in A$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{y}\in {\mathcal {A}}$$\end{document} and $p_{0} (0, 0) = P (Y_{1} = 0, Y_{2} = 0 | S)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_0(0,0) = P(Y_1 = 0, Y_2 =0|{{\textbf{S}}})$$\end{document} under the Ising model.

Proof

We first prove the statement for $J \geq 4$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J\ge 4$$\end{document} . Suppose that $p_{0} (y) = P (Y = y | S),$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_0 ({\textbf{y}}) = P({\textbf{Y}} = {\textbf{y}}|{{\textbf{S}}}), $$\end{document} for any $y \in A$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{y}} \in {\mathcal {A}}$$\end{document} and $p_{0} (0, 0) = P (Y_{1} = 0, Y_{2} = 0 | S)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_0(0,0) = P(Y_1 = 0, Y_2 =0|{{\textbf{S}}})$$\end{document} . We will prove that $S = S_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}={{\textbf{S}}}_0$$\end{document} .

We start by considering items 1, 2, 3, 4. We define the set $A_{3, 4} = {{(x_{1}, \dots, x_{J})}^{⊤} \in A : x_{5} = . . . = x_{J} = 0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {A}}_{3,4} = \{(x_1,\ldots , x_J)^\top \in {\mathcal {A}}: x_5 =... = x_J=0\}$$\end{document} . We note that $A_{3, 4}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {A}}_{3,4}$$\end{document} has 12 elements. Using $y_{d} = {(1, 0, 0, 0, 0, \dots, 0)}^{⊤} \in A_{3, 4}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{y}}}_d = (1, 0,0,0,0,\ldots ,0)^\top \in {\mathcal {A}}_{3,4}$$\end{document} as the baseline pattern, for any $y \in A_{3, 4}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{y}}}\in {\mathcal {A}}_{3,4}$$\end{document} such that $y \neq y_{d}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{y}}}\ne {{\varvec{y}}}_d$$\end{document} we have

\begin{matrix} log (\frac{exp (\frac{1}{2} y^{⊤} S y)}{c (S)} / \frac{exp (\frac{1}{2} y_{d}^{⊤} S y_{d})}{c (S)}) = \frac{1}{2} y^{⊤} S y - \frac{1}{2} y_{d}^{⊤} S y_{d} = \frac{1}{2} y^{⊤} S_{0} y - \frac{1}{2} y_{d}^{⊤} S_{0} y_{d} . \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \log \left[ \frac{\exp (\frac{1}{2}{{\varvec{y}}}^\top {{\textbf{S}}}{{\varvec{y}}})}{c({{\textbf{S}}})} / \frac{\exp (\frac{1}{2}{{\varvec{y}}}_d^\top {{\textbf{S}}}{{\varvec{y}}}_d)}{c({{\textbf{S}}})}\right] = \frac{1}{2}{{\varvec{y}}}^\top {{\textbf{S}}}{{\varvec{y}}}-\frac{1}{2}{{\varvec{y}}}_d^\top {{\textbf{S}}}{{\varvec{y}}}_d = \frac{1}{2}{{\varvec{y}}}^\top {{\textbf{S}}}_0{{\varvec{y}}}-\frac{1}{2}{{\varvec{y}}}_d^\top {{\textbf{S}}}_0{{\varvec{y}}}_d. \end{aligned}$$\end{document}

That gives us 11 linear equations for 10 parameters, $s_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{ij}$$\end{document} , $i, j \leq 4$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i,j\le 4$$\end{document} . By simplifying these equations, we have (1) two linear equations for $(s_{11}, s_{12}, s_{22})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(s_{11}, s_{12}, s_{22})$$\end{document}

(10)

\begin{matrix} s_{11} - s_{22} & = s_{11}^{0} - s_{22}^{0} \\ 2 s_{12} + s_{22} & = 2 s_{12}^{0} + s_{22}^{0} \end{matrix}

and (2) $s_{1 i} = s_{1 i}^{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{1i} = s_{1i}^0$$\end{document} , $s_{2 i} = s_{2 i}^{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{2i} = s_{2i}^0$$\end{document} , $s_{ij} = s_{ij}^{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{ij} = s_{ij}^0$$\end{document} for all $i, j = 3, 4$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i,j = 3, 4$$\end{document} .

We repeat the above calculation for any four-item set involving items 1 and 2. By choosing any item pair $i, j > 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i,j > 2$$\end{document} , $i \neq j$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i\ne j$$\end{document} and repeating the above calculation with patterns in the set $A_{i, j} = {{(x_{1}, \dots, x_{J})}^{⊤} \in A : x_{l} = 0, l \neq 1, 2, i, j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal A_{i,j} = \{(x_1,\ldots , x_J)^\top \in {\mathcal {A}}: x_l = 0, l \ne 1, 2, i, j\}$$\end{document} , we have $s_{1 i} = s_{1 i}^{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{1i} = s_{1i}^0$$\end{document} , $s_{2 i} = s_{2 i}^{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{2i} = s_{2i}^0$$\end{document} , $s_{ij} = s_{ij}^{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{ij} = s_{ij}^0$$\end{document} for all $i, j > 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i,j > 2$$\end{document} , $i \neq j$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i\ne j$$\end{document} . With the above argument and (10), we only need to show $s_{11} = s_{11}^{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{11} = s_{11}^0$$\end{document} to prove $S = S_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}= {{\textbf{S}}}_0$$\end{document} . To prove $s_{11} = s_{11}^{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{11} = s_{11}^0$$\end{document} , we use $p_{0} (0, 0) / p_{0} (y_{d}) = P (Y_{1} = 0, Y_{2} = 0 | S) / P (Y = y_{d} | S)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_0(0,0)/p_0({{\varvec{y}}}_d) = P(Y_1 = 0, Y_2 =0|{{\textbf{S}}})/P({{\textbf{Y}}}= {{\varvec{y}}}_d|{{\textbf{S}}})$$\end{document} and that $s_{ij} = s_{ij}^{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{ij} = s_{ij}^0$$\end{document} for all $i, j > 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i,j > 2$$\end{document} , $i \neq j$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i\ne j$$\end{document} which we have proved. We have

\begin{matrix} \sum_{y \in A_{0}} exp (- \frac{1}{2} s_{11}^{0} + \frac{1}{2} y^{⊤} S_{0} y) = \sum_{y \in A_{0}} exp (- \frac{1}{2} s_{11} + \frac{1}{2} y^{⊤} S_{0} y), \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sum _{{{\varvec{y}}}\in {\mathcal {A}}_0} \exp \left( -\frac{1}{2}s_{11}^0 + \frac{1}{2}{{\varvec{y}}}^\top {{\textbf{S}}}_0{{\varvec{y}}}\right) = \sum _{{{\varvec{y}}}\in {\mathcal {A}}_0} \exp \left( -\frac{1}{2}s_{11} + \frac{1}{2}{{\varvec{y}}}^\top {{\textbf{S}}}_0{{\varvec{y}}}\right) , \end{aligned}$$\end{document}

where $A_{0} = {{(x_{1}, \dots, x_{J})}^{⊤} \in {0, 1}^{J} : x_{1} = x_{2} = 0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {A}}_0 = \{(x_1,\ldots , x_J)^\top \in \{0,1\}^J: x_1 = x_2 = 0\}$$\end{document} . As the right-hand side of the above equation is a strictly monotone decreasing function of $s_{11}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{11}$$\end{document} , we know that $s_{11} = s_{11}^{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{11} = s_{11}^0$$\end{document} is the only solution to the above equation. This proves the $J \geq 4$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J \ge 4$$\end{document} case.

We now move on to the case when $J = 3$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J = 3$$\end{document} . We consider $A = {{(x_{1}, x_{2}, x_{3})}^{⊤} \in {0, 1}^{3} : x_{1} = 1 or x_{2} = 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {A}} = \{(x_1, x_2, x_3)^\top \in \{0,1\}^3:x_1 = 1 \text{ or } x_2=1 \}$$\end{document} and $y_{d} = {(1, 0, 0)}^{⊤}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{y}}}_d = (1, 0,0)^\top $$\end{document} . Using $y_{d}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{y}}}_d$$\end{document} as the baseline, for any $y \in A$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{y}}}\in {\mathcal {A}}$$\end{document} , $y \neq y_{d}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{y}}}\ne {{\varvec{y}}}_d$$\end{document} , we construct five linear equations given by $log (p_{0} (y) / p_{0} (y_{d})) = log (P (Y = y | S) / P (Y = y_{d} | S))$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\log (p_0({{\varvec{y}}})/p_0({{\varvec{y}}}_d) )= \log (P({{\textbf{Y}}}={{\varvec{y}}}\vert {{\textbf{S}}})/P({{\textbf{Y}}}={{\varvec{y}}}_d\vert {{\textbf{S}}}))$$\end{document} . From these equations, we obtain: (1) two linear equations for $(s_{11}, s_{12}, s_{22})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(s_{11},s_{12},s_{22})$$\end{document}

(11)

\begin{matrix} s_{11} - s_{22} & = s_{11}^{0} - s_{22}^{0} \\ 2 s_{12} + s_{22} & = 2 s_{12}^{0} + s_{22}^{0} \end{matrix}

and (2) $s_{13} = s_{13}^{0}, s_{23} = s_{23}^{0}, s_{33} = s_{33}^{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{13} = s_{13}^0, s_{23}=s_{23}^0, s_{33} = s_{33}^0$$\end{document} . Again, with the above equations, $S = S^{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{S}}}= {{\textbf{S}}}^0$$\end{document} if $s_{11} = s_{11}^{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{11} = s_{11}^0$$\end{document} . To show $s_{11} = s_{11}^{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{11} = s_{11}^0$$\end{document} , we use $p_{0} (0, 0) / p_{0} (y_{d}) = P (Y_{1} = 0, Y_{2} = 0 | S) / P (Y = y_{d} | S)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_0(0,0)/p_0({{\varvec{y}}}_d) = P(Y_1 = 0, Y_2 =0|{{\textbf{S}}})/P({{\textbf{Y}}}= {{\varvec{y}}}_d|{{\textbf{S}}})$$\end{document} and $s_{33} = s_{33}^{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{33} = s_{33}^0$$\end{document} . We have

\begin{matrix} exp (- \frac{1}{2} s_{11}^{0}) + exp (- \frac{1}{2} s_{11}^{0} + \frac{1}{2} s_{33}^{0}) = exp (- \frac{1}{2} s_{11}) + exp (- \frac{1}{2} s_{11} + \frac{1}{2} s_{33}^{0}) . \end{matrix}

As the right-hand side is a strictly monotone decreasing function of $s_{11}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{11}$$\end{document} , we know that $s_{11} = s_{11}^{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{11} = s_{11}^0$$\end{document} is the only solution to the above equation. This proves the $J = 3$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J=3$$\end{document} case, which completes the proof.

$□$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}

Following the same proof strategy as above, it can be further shown that $s_{11}, s_{12},$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{11}, s_{12},$$\end{document} and $s_{22}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{22}$$\end{document} are not identified in the complete-case analysis, while the rest of the parameters are. This is consistent with the result of Simulation Study II, where the other model parameters can be accurately estimated while the estimates of $s_{11}, s_{12},$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{11}, s_{12},$$\end{document} and $s_{22}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{22}$$\end{document} are substantially different from the corresponding true parameters.

Footnotes

¹ All simulations were performed on Intel-based systems with the following configuration: Ubuntu 22.04.3 LTS operating system; Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90 GHz; Python version 3.11.2; Numpy version 1.23.5.

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

American Psychiatric Association (2000). Diagnostic and statistical manual of mental disorders: DSM-IV-TR (4th, Text (Revision): American Psychiatric Association.Google Scholar

Anderson, C. J., Yu, H-T (2007). Log-multiplicative association models as item response models. Psychometrika, 72(1), 5–23.CrossRef Google Scholar

Armour, C., Fried, E. I., Deserno, M. K., Tsai, J., Pietrzak, R. H. (2017). A network analysis of DSM-5 posttraumatic stress disorder symptoms and correlates in U.S. military veterans. Journal of Anxiety Disorders, 45, 49–59.CrossRef Google Scholar PubMed

Barndorff-Nielsen, O., Kent, J., Sørensen, M. (1982). Normal variance-mean mixtures and z distributions. International Statistical Review/Revue Internationale de Statistique, 50(2), 145–159.Google Scholar

Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society: Series B (Methodological), 36(2), 192–225.CrossRef Google Scholar

Besag, J. (1975). Statistical analysis of non-lattice data. Journal of the Royal Statistical Society: Series D (The Statistician), 24(3), 179–195.Google Scholar

Blanco, C., Rubio, J. M., Wall, M., Secades-Villa, R., Beesdo-Baum, K., Wang, S. (2014). The latent structure and comorbidity patterns of generalized anxiety disorder and major depressive disorder: A national study. Depression and Anxiety, 31(3), 214–222.CrossRef Google Scholar PubMed

Borsboom, D. (2008). Psychometric perspectives on diagnostic systems. Journal of Clinical Psychology, 64(9), 1089–1108.CrossRef Google Scholar PubMed

Borsboom, D. (2022). Possible futures for network psychometrics. Psychometrika, 87(1), 253–265.CrossRef Google Scholar PubMed

Borsboom, D., Deserno, M. K., Rhemtulla, M., Epskamp, S., Fried, E. I., McNally, R. J., Waldorp, L. J. (2021). Network analysis of multivariate data in psychological science. Nature Reviews Methods Primers, 1(1), 58.CrossRef Google Scholar

Borsboom, D., Fried, E. I., Epskamp, S., Waldorp, L. J., van Borkulo, C. D., van der Maas, H. L., Cramer, A. O. (2017). False alarm? a comprehensive reanalysis of “evidence that psychopathology symptom networks have limited replicability” by Forbes, Wright, Markon, and Krueger (2017). Journal of Abnormal Psychology, 126(7), 989–999.CrossRef Google Scholar PubMed

Brunson, J. C., Laubenbacher, R. C. (2018). Applications of network analysis to routinely collected health care data: A systematic review. Journal of the American Medical Informatics Association, 25(2), 210–221.CrossRef Google Scholar PubMed

Burger, J., Isvoranu, A. M., Lunansky, G., Haslbeck, J. M. B., Epskamp, S., Hoekstra, R. H. A., Blanken, T. F. (2022). Reporting standards for psychological network analyses in cross-sectional data. Psychological Methods, 28(4), 806–824.CrossRef Google Scholar PubMed

Burgess, N., Hitch, G. J. (1999). Memory for serial order: A network model of the phonological loop and its timing. Psychological Review, 106(3), 551–581.CrossRef Google Scholar

Chen, Y., Li, X., Liu, J., Ying, Z. (2018). Robust measurement via a fused latent and graphical item response theory model. Psychometrika, 83(3), 538–562.CrossRef Google Scholar

De Ron, J., Fried, E. I., Epskamp, S. (2021). Psychological networks in clinical populations: Investigating the consequences of Berkson’s bias. Psychological Medicine, 51(1), 168–176.CrossRef Google Scholar PubMed

Epskamp, S. (2020). Psychometric network models from time-series and panel data. Psychometrika, 85(1), 206–231.CrossRef Google Scholar PubMed

Epskamp, S., Borsboom, D., Fried, E. I. (2018). Estimating psychological networks and their accuracy: A tutorial paper. Behavior Research Methods, 50(1), 195–212.CrossRef Google Scholar PubMed

Epskamp, S., Fried, E. I. (2018). A tutorial on regularized partial correlation networks. Psychological Methods, 23(4), 617–634.CrossRef Google Scholar PubMed

Epskamp, S., Waldorp, L. J., Mõttus, R., Borsboom, D. (2018). The Gaussian graphical model in cross-sectional and time-series data. Multivariate Behavioral Research, 53(4), 453–480.CrossRef Google Scholar PubMed

Fried, E. I., van Borkulo, C. D., Cramer, A. O., Boschloo, L., Schoevers, R. A., Borsboom, D. (2017). Mental disorders as networks of problems: A review of recent insights. Social Psychiatry and Psychiatric Epidemiology, 52(1), 1–10.CrossRef Google Scholar PubMed

Fried, E. I., Von Stockert, S., Haslbeck, J., Lamers, F., Schoevers, R., Penninx, B. (2020). Using network analysis to examine links between individual depressive symptoms, inflammatory markers, and covariates. Psychological Medicine, 50(16), 2682–2690.CrossRef Google Scholar PubMed

Gile, K. J., Handcock, M. S. (2017). Analysis of networks with missing data with application to the national longitudinal study of adolescent health. Journal of the Royal Statistical Society: Series C (Applied Statistics), 66(3), 501–519.Google Scholar

Grant, B. F., Moore, T., Shepard, J., & Kaplan, K. (2003). Source and accuracy statement: Wave 1 national epidemiologic survey on alcohol and related conditions (NESARC) (p. 52). National Institute on Alcohol Abuse and Alcoholism.Google Scholar

Hasin, D. S., Goodwin, R. D., Stinson, F. S., Grant, B. F. (2005). Epidemiology of major depressive disorder: Results from the national epidemiologic survey on alcoholism and related conditions. Archives of General Psychiatry, 62(10), 1097–1106.CrossRef Google Scholar PubMed

Haslbeck, J. M., Fried, E. I. (2017). How predictable are symptoms in psychopathological networks? A reanalysis of 18 published datasets. Psychological Medicine, 47(16), 2767–2776.CrossRef Google Scholar PubMed

Hettema, J. M. (2008). The nosologic relationship between generalized anxiety disorder and major depression. Depression and Anxiety, 25(4), 300–316.CrossRef Google Scholar PubMed

Holland, P. W. (1990). The Dutch identity A new tool for the study of item response models. Psychometrika, 55, 5–18.CrossRef Google Scholar

Huisman, M. (2009). Imputation of missing network data: Some simple procedures. Journal of Social Structure, 10(1), 1–29.Google Scholar

Ip, E. H. (2002). Locally dependent latent trait model and the Dutch identity revisited. Psychometrika, 67, 367–386.CrossRef Google Scholar

Ising, E. (1925). Beitrag zur theorie des ferromagnetismus. Zeitschrift für Physik, 31(1), 253–258.CrossRef Google Scholar

Kohler, K., Jankowski, M. D., Bashford, T., Goyal, D. G., Habermann, E. B., Walker, L. E. (2022). Using network analysis to model the effects of the SARS Cov2 pandemic on acute patient care within a healthcare system. Scientific Reports, 12(1), 10050.CrossRef Google Scholar PubMed

Koponen, M., Asikainen, M. A., Viholainen, A., Hirvonen, P. E. (2019). Using network analysis methods to investigate how future teachers conceptualize the links between the domains of teacher knowledge. Teaching and Teacher Education, 79, 137–152.CrossRef Google Scholar

Li, Y., Craig, B. A., Bhadra, A. (2019). The graphical horseshoe estimator for inverse covariance matrices. Journal of Computational and Graphical Statistics, 28(3), 747–757.CrossRef Google Scholar

Lin, S-Y, Fried, E. I., Eaton, N. R. (2020). The association of life stress with substance use symptoms: A network analysis and replication. Journal of Abnormal Psychology, 129(2), 204–214.CrossRef Google Scholar PubMed

Little, R. J., & Rubin, D. B. (2019). Statistical analysis with missing data (Vol. 793). John Wiley & Sons.Google Scholar

Liu, J., Gelman, A., Hill, J., Su, Y-S, Kropko, J. (2014). On the stationary distribution of iterative imputations. Biometrika, 101(1), 155–173.CrossRef Google Scholar

Luke, D. A., Harris, J. K. (2007). Network analysis in public health: History, methods, and applications. Annual Review of Public Health, 28, 69–93.CrossRef Google Scholar PubMed

Marsman, M., Huth, K., Waldorp, L., Ntzoufras, I. (2022). Objective Bayesian edge screening and structure selection for Ising networks. Psychometrika, 87(1), 47–82.CrossRef Google Scholar PubMed

Marsman, M., Rhemtulla, M. (2022). Guest editors’ introduction to the special issue “network psychometrics in action”: Methodological innovations inspired by empirical problems. Psychometrika, 87(1), 1–11.CrossRef Google Scholar

Mazumder, R., Hastie, T. (2012). The graphical Lasso: New insights and alternatives. Electronic Journal of Statistics, 6, 2125–2149.CrossRef Google Scholar PubMed

McBride, O., van Bezooijen, J., Aggen, S. H., Kendler, K. S., Fried, E. I. (2023). Quantifying skip-out information loss when assessing major depression symptoms. Journal of Psychopathology and Clinical Science, 132(4), 396–408.CrossRef Google Scholar PubMed

Mkhitaryan, S., Crutzen, R., Steenaart, E., de Vries, N. K. (2019). Network approach in health behavior research: How can we explore new questions?. Health Psychology and Behavioral Medicine, 7(1), 362–384.CrossRef Google Scholar PubMed

Noghrehchi, F., Stoklosa, J., Penev, S., Warton, D. I. (2021). Selecting the model for multiple imputation of missing data: Just use an IC!. Statistics in Medicine, 40(10), 2467–2497.CrossRef Google Scholar PubMed

Polson, N. G., Scott, J. G., Windle, J. (2013). Bayesian inference for logistic models using Póolya-Gamma latent variables. Journal of the American Statistical Association, 108(504), 1339–1349.CrossRef Google Scholar

Roçkovóa, V. (2018). Bayesian estimation of sparse signals with a continuous spike-and-slab prior. The Annals of Statistics, 46(1), 401–437.Google Scholar

Ryan, O., Bringmann, L. F., Schuurman, N. K. (2022). The challenge of generating causal hypotheses using network models. Structural Equation Modeling: A Multidisciplinary Journal, 29(6), 953–970.CrossRef Google Scholar

Siew, C. S. (2020). Applications of network science to education research: Quantifying knowledge and the development of expertise through network analysis. Education Sciences, 10(4), 1–16.CrossRef Google Scholar

Simon de Blas, C., Gomez Gonzalez, D., & Criado Herrero, R. (2021). Network analysis: An indispensable tool for curricula design. A real case-study of the degree on mathematics at the URJC in Spain. PLOS ONE, 16(3), 1–21.Google Scholar

Sweet, T. M., Thomas, A. C., Junker, B. W. (2013). Hierarchical network models for education research: Hierarchical latent space models. Journal of Educational and Behavioral Statistics, 38(3), 295–318.CrossRef Google Scholar

Van Borkulo, C. D., Borsboom, D., Epskamp, S., Blanken, T. F., Boschloo, L., Schoevers, R. A., Waldorp, L. J. (2014). A new method for constructing networks from binary data. Scientific Reports, 4(1), 1–10.CrossRef Google Scholar PubMed

van Buuren, S. (2018). Flexible imputation of missing data (2nd ed.). Chapman and Hall/CRC.Google Scholar

van der Maas, H. L. J., Dolan, C. V., Grasman, R. P. P. P., Wicherts, J. M., Huizenga, H. M., Raijmakers, M. E. J. (2006). A dynamical model of general intelligence: The positive manifold of intelligence by mutualism. Psychological Review, 113(4), 842–861.CrossRef Google Scholar PubMed

Van Der Maas, H. L., Kan, K-J, Marsman, M., Stevenson, C. E. (2017). Network models for cognitive development and intelligence. Journal of Intelligence, 5(2), 16.CrossRef Google Scholar PubMed

Willcox, K. E., Huang, L. (2017). Network models for mapping educational data. Design Science, 3.CrossRef Google Scholar

Yuan, M., Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika, 94(1), 19–35.CrossRef Google Scholar

Figure 1 Flowchart of the updating rule for the proposed method.

Algorithm 1 Ising Network Analysis with Iterative Imputation

Figure 2 a Boxplots of MSEs for edge parameters sjl\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$s_{jl}$$\end{document}. b Boxplots of MSEs for intercept parameters sjj\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$s_{jj}$$\end{document}.

Table 1 MSEs and biases for edge parameters.

Table 2 Descriptions of MDD and GAD items and their missing rates.

Figure 3 Estimated network structure for MDD and GAD. a Complete-case analysis. b Proposed method.

Article contents

A Note on Ising Network Analysis with Missing Data

Abstract

Keywords

1. Proposed Method

1.1. Ising Model

1.2. Proposed Method

1.3. Statistical Consistency

Theorem 1

1.4. Computational Details

2. Numerical Experiments

2.1. Simulation

2.2. A Real Data Application

3. Concluding remarks

Acknowledgements

Appendix: Identifiability of Ising Model with Two Screening Items

Proposition 1

Proof

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests