Bayesian Adaptive Lasso for Detecting Item–Trait Relationship and Differential Item Functioning in Multidimensional Item Response Theory Models

Na Shan; Ping-Feng Xu

doi:10.1007/s11336-024-09998-x

Bayesian Adaptive Lasso for Detecting Item–Trait Relationship and Differential Item Functioning in Multidimensional Item Response Theory Models

Published online by Cambridge University Press: 01 January 2025

Na Shan

and

Ping-Feng Xu

Show author details

Na Shan*: Affiliation:
Northeast Normal University
Ping-Feng Xu: Affiliation:
Northeast Normal University Shanghai Zhangjiang Institute of Mathematics
*: Correspondence should bemade to Na Shan, School of Psychology&Key Laboratory of Applied Statistics of MOE, Northeast Normal University, 5268 Renmin Street, Changchun, Jilin, China. Email: [email protected]

Article contents

Abstract
Introduction
MIRT Models Incorporating DIF Effects
Model Estimation by Bayesian Adaptive Lasso
A Simple Heuristic Example
Simulation Studies
Real Data Analysis
Discussion
Author Contributions
Data Availability
Declarations
Footnotes
References

Rights & Permissions

Abstract

In multidimensional tests, the identification of latent traits measured by each item is crucial. In addition to item–trait relationship, differential item functioning (DIF) is routinely evaluated to ensure valid comparison among different groups. The two problems are investigated separately in the literature. This paper uses a unified framework for detecting item–trait relationship and DIF in multidimensional item response theory (MIRT) models. By incorporating DIF effects in MIRT models, these problems can be considered as variable selection for latent/observed variables and their interactions. A Bayesian adaptive Lasso procedure is developed for variable selection, in which item–trait relationship and DIF effects can be obtained simultaneously. Simulation studies show the performance of our method for parameter estimation, the recovery of item–trait relationship and the detection of DIF effects. An application is presented using data from the Eysenck Personality Questionnaire.

Keywords

Bayesian adaptive Lasso item–trait relationship differential item functioning multidimensional item response theory model regularization

Type: Original Research
Information: Psychometrika , Volume 89 , Issue 4 , December 2024 , pp. 1337 - 1365

DOI: https://doi.org/10.1007/s11336-024-09998-x [Opens in a new window]
Copyright: © 2024 The Author(s), under exclusive licence to The Psychometric Society

1. Introduction

In modern psychological and educational tests, multiple latent traits are often assessed collectively from a bundle of item responses. To model the probability of an item response as a function of an individual’s multiple latent traits and item characteristics, a variety of multidimensional item response theory (MIRT) models have been proposed (Reckase, Reference Reckase2009). Most MIRT models are confirmatory, i.e., the latent traits associated with each item are pre-specified by prior knowledge (Janssen and De Boeck, Reference Janssen and De Boeck1999; Mckinley, 1989). Various estimation methods have been developed for confirmatory MIRT models, including marginal maximum likelihood estimation (Bock et al., Reference Bock, Gibbons and Muraki1988) and Bayesian estimation (Béguin and Glas, Reference Béguin and Glas2001). However, if the item–trait relationship in the confirmatory analysis is misspecified, model lack of fit and erroneous parameter estimation will occur (da Silva et al., Reference da Silva, Liu, Huggins-Manley and Bazán2019; Jin and Wang, Reference Jin and Wang2014).

A conventional approach to explore the item–trait relationship is exploratory item factor analysis (IFA; Bock et al., Reference Bock, Gibbons and Muraki1988), which is data driven and could avoid the problems caused by the erroneous item–trait specification. Exploratory IFA aims to identify the optimal number of latent traits as well as the entire item–trait relationship. Nevertheless, exploratory IFA cannot be applied without drawbacks. Since little prior knowledge or constraints on the null relations among items and latent traits are utilized in exploratory IFA, the resulting estimation may include redundant parameters. Previous studies have shown that unnecessary model parameters can yield less efficient estimators and lower the generalizability of exploratory IFA (Browne and Cudeck, Reference Browne and Cudeck1989; Huang et al., Reference Huang, Chen and Weng2017).

The confirmatory and exploratory approaches lie on two ends of the input of item–trait relationship in MIRT models. To be more flexible on the substantive continuum, latent variable selection using regularization approaches has been developed on the basis of the confirmatory analysis. Sun et al. (Reference Sun, Chen, Liu, Ying and Xin2016) proposed a sparse estimation of the item–trait relationship in MIRT models by using the expectation–maximization (EM) algorithm to maximize the $L_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_1$$\end{document} penalized log-likelihood. Chen (Reference Chen2020) used the Bayesian Lasso to estimate within-item dimensionality (loading) and residual structure in MIRT models under a partially confirmatory framework. Further developments of latent variable selection in MIRT models can be seen in Xu et al. (Reference Xu, Shang, Zheng, Shan and Tang2022) and Zhang and Chen (Reference Zhang and Chen2022). With the same identifiability conditions given in Sun et al. (Reference Sun, Chen, Liu, Ying and Xin2016), Xu et al. (Reference Xu, Shang, Zheng, Shan and Tang2022) optimized the $L_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_0$$\end{document} penalized log-likelihood by updating the model (i.e., item–trait relationship) and the model parameters simultaneously in each iteration, and the estimation accuracy of the item–trait relationship is improved. Zhang and Chen (Reference Zhang and Chen2022) gave a quasi-Newton stochastic proximal algorithm for maximizing an objective function based on a marginal likelihood/pseudo-likelihood, possibly with constraints and/or penalties on parameters, and their method can enhance the computational efficiency of the $L_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_1$$\end{document} penalized log-likelihood proposed by Sun et al. (Reference Sun, Chen, Liu, Ying and Xin2016).

The latent variable selection methods in MIRT models can identify the sparsity of item–trait relationship, but the above studies do not incorporate individual characteristics, such as gender and age. In heterogeneous populations, differential item functioning (DIF) is routinely examined to judge whether item responses are related to individual characteristics. Generally, DIF refers to the condition in which persons from different groups with the same latent traits have unequal probabilities of endorsing an item. As a result of DIF, a biased item provides either a constant advantage for a particular group (i.e., uniform DIF) or an advantage varying in magnitude and/or direction across the latent trait continuum (i.e., non-uniform DIF). If either type of DIF is present but not correctly addressed, biased estimates and specious treatment differences will arise, and the fairness of test is threatened (Bauer, Reference Bauer2017; Millsap and Everson, Reference Millsap and Everson1993; Teresi et al., Reference Teresi, Ramirez, Lai and Silver2008).

In multidimensional tests with confirmatory item–trait relationship, several approaches have been proposed for DIF detection, and they are mostly multidimensional extensions of unidimensional DIF detection approaches, such as multidimensional SIBTEST (Stout et al., Reference Stout, Li, Nandakumar and Bolt1997), multidimensional differential item functioning of items and tests (Oshima et al., Reference Oshima, Raju and Flowers1997), logistic regression (Mazor et al., Reference Mazor, Hambleton and Clauser1998), item response theory likelihood ratio (IRT-LR) test (Suh and Cho, Reference Suh and Cho2014) and multiple indicators multiple causes (MIMIC) model (Lee et al., Reference Lee, Bulut and Suh2017). These approaches have in common that a test statistic is performed for each item separately and the item is regarded as DIF if the test statistic exceeds a critical threshold. When DIF test statistics are separately computed for each item, some problems such as multiple testing and a contaminated anchor set may arise (Kim and Oshima, Reference Kim and Oshima2013; Woods, Reference Woods2009).

In recent years, regularization methods have been proposed for DIF detection, where DIF effects are simultaneously examined for all items on the basis of a statistical model (e.g., IRT model). Magis et al. (Reference Magis, Tuerlinckx and De Boeck2015) used the Lasso (least absolute shrinkage and selection operator; Tibshirani, Reference Tibshirani1996) approach for identifying DIF in a logistic regression model and found the Lasso method outperformed the logistic regression and Mantel–Haenszel methods in terms of false positive and true positive rates for small samples. Tutz and Schauberger (Reference Tutz and Schauberger2015) and Schauberger and Mair (Reference Schauberger and Mair2020) both introduced multiple DIF-inducing covariates, and then computed the penalized maximum likelihood estimators for simultaneously detecting DIF effects from different covariates in Rasch models and generalized partial credit models, respectively. Belzak and Bauer (Reference Belzak and Bauer2020) investigated Lasso regularization for identifying DIF in two-parameter logistic (2PL) models, and found the Lasso regularization had better control of type I error than the likelihood ratio test method when DIF was pervasive and sample size was large. Furthermore, Bayesian regularization methods with a variety of penalized priors have been investigated for DIF detection in moderated nonlinear factor analysis models, and Lasso and spike-and-slab priors were found to outperform the other priors (Bauer et al., Reference Bauer, Belzak and Cole2020; Brandt et al., 2023; Chen et al., Reference Chen, Bauer, Belzak and Brandt2022). The above regularization approaches are all aimed at unidimensional DIF detection.

For identifying DIF in the simple-structure multidimensional 2PL models, Wang et al. (Reference Wang, Zhu and Xu2023) found that the adaptive Lasso outperformed the Lasso, and both regularization methods performed better than the likelihood ratio test in most conditions. In Wang et al. (Reference Wang, Zhu and Xu2023)’s study, the simple structure means that each item simply measures one latent trait, and the item–trait structure is confirmatory and known in advance. In practical applications, some items may correlated with more than one latent trait in a test. As shown in Asparouhov and Muthén (Reference Asparouhov and Muthén2009), when nonzero cross-loadings are misspecified as zero in confirmatory factor analysis (CFA), it will result in substantial bias in the rest of the parameter estimates (i.e., overestimated factor correlations) as well as poor confidence interval coverage. In order to add modeling flexibility and reduce the bias of parameter estimates from the misspecification of factor loadings in a confirmatory measurement model, exploratory structural equation modeling (ESEM) is introduced by Asparouhov and Muthén (Reference Asparouhov and Muthén2009). In an ESEM model, an exploratory factor analysis (EFA) measurement model is used, instead of a CFA measurement model in a structural equation model. Examples of ESEM models include but are not limited to multiple-group EFA with measurement invariance testing, and test-retest (longitudinal) EFA.

In an MIRT model, when the item–trait relationship is not correctly specified (e.g., small cross-loadings are misspecified as zero for a simple structure), what impact will it have on subsequent parameter estimation and DIF detection? Consider a simple example with two latent traits, each of which was measured by five test items. The item discriminations were set with two cross-loadings 0.3 for each latent trait. Two groups of persons were investigated, and small uniform DIF effects were assumed for items 4 and 8. More details about the example can be seen in Section “A heuristic simple example”. We found that eliminating all small cross-loadings and using a confirmatory item–trait structure resulted in substantial bias in the estimates of discriminations, DIF parameters and trait correlation as well as poor confidence interval coverage. Two exploratory methods for identifying the item–trait structure were also examined in the example. One was first identifying the item–trait structure by the EML1 method given by Sun et al. (Reference Sun, Chen, Liu, Ying and Xin2016) and then used the structure as confirmatory in the subsequent DIF detection; the other was our proposed method for simultaneously detecting item–trait relationship and DIF effects. Our proposed method had the smallest bias and highest credible interval coverage for the estimates of discriminations, DIF parameters and trait correlation, as shown in Table 1.

Given the effectiveness of Bayesian regularization methods for analyzing complex models and data types in psychological and behavioral studies (Brandt et al., 2023; Chen et al., Reference Chen, Guo, Zhang and Pan2021, Reference Chen, Bauer, Belzak and Brandt2022; Feng et al., Reference Feng, Wu and Song2017; Pan et al., Reference Pan, Ip and Dubé2017), we propose a Bayesian adaptive Lasso approach for simultaneously detecting item–trait relationship and DIF effects in MIRT models. By incorporating DIF-inducing covariates in MIRT models, the detection of item–trait relationship and DIF effects can be solved jointly latent/observed variables and their interactions as variable selection for latent/observed variables and their interactions. The contribution of this study is twofold. First, we will explore the simultaneous detection of item–trait relationship and DIF effects in the context of MIRT models. Compared to Wang et al. (Reference Wang, Zhu and Xu2023)’s study, the main difference is that they used the simple-structure multidimensional 2PL models, where the item–trait relationship was confirmatory and no cross-loadings were allowed. Our proposed method explores the item–trait relationship for non-anchor items, which can load on more than one latent trait. Second, we use Bayesian adaptive Lasso (a type of Bayesian regularization method) to estimate item discriminations in addition to DIF parameters. In Belzak and Bauer (Reference Belzak and Bauer2020) and Chen et al. (Reference Chen, Bauer, Belzak and Brandt2022), no regularization was used for the baseline item discriminations, since they focused on unidimensional factor models for DIF analysis. In addition, we study DIF effects for both categorical and metric covariates, extending the multiple types of DIF-inducing covariates investigated in the unidimensional factor models.

The rest of the article is organized as follows. First, the two-parameter compensatory MIRT models incorporating DIF effects are introduced. Then, we describe the Bayesian estimation with the adaptive Lasso for the proposed models. Next, a comprehensive simulation study is conducted and a real data analysis is reported. Finally, we conclude the article with discussion.

2. MIRT Models Incorporating DIF Effects

In multidimensional tests that intentionally measure two or more latent traits, MIRT models are often used to model the response probability of an item as a function of item characteristics and individual’s multiple latent traits (Reckase, Reference Reckase2009). Consider a test containing J items and K latent traits. There are N persons, who respond to all J items. In this paper, all responses are dichotomous. Let $y_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{ij}$$\end{document} be the response of person i to item j, with $y_{ij} = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{ij}=1$$\end{document} denoting a correct response and $y_{ij} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{ij}=0$$\end{document} otherwise. Following the notation of Wang et al. (Reference Wang, Zhu and Xu2023), a two-parameter compensatory MIRT model incorporating DIF effects can be described as:

(1)

\begin{matrix} p (y_{ij} = 1 | θ_{i}) = F (a_{j}^{T} θ_{i} + d_{j} + x_{i}^{T} β_{j} + x_{i}^{T} γ_{j} θ_{i}), \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p(y_{ij}=1|\varvec{\theta }_{i}) = F({\varvec{a}}_{j}^{T}\varvec{\theta }_{i} + d_{j} +{\varvec{x}}_{i}^{T}\varvec{\beta }_{j} +{\varvec{x}}_{i}^{T}\varvec{\gamma }_{j}\varvec{\theta }_{i}), \end{aligned}$$\end{document}

where $p (y_{ij} = 1 | θ_{i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(y_{ij}=1|\varvec{\theta }_{i})$$\end{document} is the probability of a correct response for person i to item j, $θ_{i} = {(θ_{i 1}, \dots, θ_{iK})}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }_{i}=(\theta _{i1}, \cdots , \theta _{iK})^{T}$$\end{document} is a K-dimensional vector of latent traits for person i, $F : R \to [0, 1]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F:\mathcal {{\varvec{R}}} \rightarrow [0,1]$$\end{document} is a pre-specified non-decreasing function, $a_{j} = {(a_{j 1}, \dots, a_{jK})}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}_{j}=(a_{j1},\cdots ,a_{jK})^{T}$$\end{document} is a K-dimensional vector of discriminations for item j, $d_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_j$$\end{document} is an intercept of item j, $x_{i} = {(x_{i 1}, \dots, x_{iP})}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{x}}_{i}=(x_{i1}, \cdots , x_{iP})^{T}$$\end{document} is a P-dimensional covariate vector for person i that can contain both categorical variable (i.e., gender) and metric variable (i.e., age), $β_{j} = {(β_{j 1}, \dots, β_{jP})}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta }_{j}=(\beta _{j1},\cdots ,\beta _{jP})^{T}$$\end{document} is a P-dimensional vector of regression coefficients implying the main effects of each covariate on item j, and $γ_{j} = (γ_{jpk})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\gamma }_{j}=(\gamma _{jpk})$$\end{document} is a P-by-K matrix of regression coefficients with element $γ_{jpk}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{jpk}$$\end{document} denoting the interaction effect of the pth covariate and the kth latent trait on item j. For the illustration of DIF effects on an item, take gender as a covariate. If the $β$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document} coefficient of gender is unequal to zero, this represents a consistent advantage for males or females (i.e., uniform DIF) on that item; if the $γ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document} coefficient of the interaction between gender and a latent trait is unequal to zero, this implies a varying advantage across the latent trait for males or females (i.e., non-uniform DIF).

In Eq. (1), item j is related to latent trait k if $a_{jk} \neq 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{jk} \ne 0$$\end{document} . The latent trait vector $θ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }_{i}$$\end{document} follows a multivariate normal distribution of $θ_{i} \sim MVN (α_{i}, Ψ_{i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }_{i} \sim \textrm{MVN}(\varvec{\alpha }_{i}, \varvec{\Psi }_{i})$$\end{document} , where both the mean vector and the covariance matrix are person-specific. Following the models given by Bauer (Reference Bauer2017), the mean of each latent trait k $(k = 1, \dots, K)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(k=1,\cdots , K)$$\end{document} can be represented as

(2)

\begin{matrix} α_{ki} = α_{k 0} + Υ_{k}^{^{'}} x_{i}, \end{matrix}

where $α_{k 0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{k0}$$\end{document} is the baseline mean when $x_{i} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{x}}_{i}={\varvec{0}}$$\end{document} , and $Υ_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Upsilon }_{k}$$\end{document} is a P-dimensional vector that captures the linear dependence on $x_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{x}}_{i}$$\end{document} . For the covariance matrix $Ψ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Psi }_{i}$$\end{document} , it can be rewritten as $Ψ_{i} = Δ_{i} Ω_{i} Δ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Psi }_{i}=\varvec{\Delta }_{i}\varvec{\Omega }_{i}\varvec{\Delta }_{i}$$\end{document} , where $Ω_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Omega }_{i}$$\end{document} is the correlation matrix, and $Δ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Delta }_{i}$$\end{document} is a diagonal matrix consisting of standard deviations. The standard deviation of each latent trait k $(k = 1, \dots, K)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(k=1,\cdots , K)$$\end{document} can be expressed as a log-linear function of $x_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{x}}_{i}$$\end{document} (Chen et al., Reference Chen, Bauer, Belzak and Brandt2022):

(3)

\begin{matrix} Δ_{(k k) i} = Δ_{(k k 0)} exp (η_{(k k)}^{^{'}} x_{i}), \end{matrix}

where $Δ_{(k k 0)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta _{(kk0)}$$\end{document} is the baseline standard deviation when $x_{i} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{x}}_{i}= {\varvec{0}}$$\end{document} , and $η_{(k k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_{(kk)}$$\end{document} is a P-dimensional vector indicating the differences in the standard deviation as a function of $x_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{x}}_{i}$$\end{document} . For each off-diagonal correlation in $Ω_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Omega }_{i}$$\end{document} , its Fisher’s z-transformation can be modeled as a linear moderation function of $x_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{x}}_{i}$$\end{document} , and the details can be found in Bauer (Reference Bauer2017). In the following, $Ω_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Omega }_{i}$$\end{document} is assumed to be constant across persons for simplicity, i.e., $Ω_{i} = Ω$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Omega }_{i}=\varvec{\Omega }$$\end{document} . Any nonzero elements in $Υ_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Upsilon }_{k}$$\end{document} or $η_{(k k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_{(kk)}$$\end{document} indicate differences in the distribution of the individual latent traits. Such differences, also called impacts, may exist regardless of whether there is DIF or not.

To identify our model defined in Eqs. (1)-(3), some assumptions need to be satisfied. Extending Sun et al. (Reference Sun, Chen, Liu, Ying and Xin2016)’s conditions for latent variable selection and Wang et al. (Reference Wang, Zhu and Xu2023)’s conditions for multidimensional DIF detection, the identifiability conditions of our model are as follows:

(1) the N-by- $(1 + P)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(1+P)$$\end{document} matrix with rows $(1, x_{1}^{T}), \dots, (1, x_{N}^{T})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\varvec{1}}, {\varvec{x}}_1^{T}), \cdots , ({\varvec{1}}, {\varvec{x}}_N^{T})$$\end{document} is full rank.
(2) $θ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }_i$$\end{document} has mean vector $0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{0}}$$\end{document} when $x_{i} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{x}}_i={\varvec{0}}$$\end{document} , i.e., $α_{k 0} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{k0}=0$$\end{document} for $k = 1, \dots, K$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k=1,\cdots ,K$$\end{document} .
(3) there are K DIF-free (anchor) items, loading on each dimension separately with unity loadings.

Condition (1) is in line with Wang et al. (Reference Wang, Zhu and Xu2023) for multidimensional DIF detection. Condition (2) and the fixed loadings for each dimension in condition (3) are used to constrain the scale of baseline latent traits. Following Wang et al. (Reference Wang, Zhu and Xu2023), Eq. (1) is identifiable when there are K DIF-free items, one for each dimension separately. Furthermore, latent variable selection in MIRT models requires K items that load on each dimension separately (Sun et al., Reference Sun, Chen, Liu, Ying and Xin2016). For the third condition, without loss of generality, we assume that the first K items are DIF-free and load on each of the K dimensions separately with unity loadings, i.e., $a_{jj} = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{jj}= 1$$\end{document} and $a_{jl} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{jl}=0$$\end{document} for $1 \leq j \neq l \leq K$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1\le j \ne l\le K$$\end{document} . Under the identifiability conditions, there are $(J - K) K$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(J-K)K$$\end{document} item discriminations and J item intercepts for estimation. In addition, there are totally $(J - K) P + (J - K) K P$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(J-K)P+(J-K)KP$$\end{document} additional parameters introduced to the conventional MIRT models, representing the possible uniform and non-uniform DIF effects of covariates on non-anchor items.

3. Model Estimation by Bayesian Adaptive Lasso

Regularization methods have been well developed in statistics and machine learning (Hastie et al., 2009; Wellner and Zhang, Reference Wellner and Zhang2012; Tibshirani et al., 2021). Tibshirani (Reference Tibshirani1996) introduced the famous Lasso estimates for linear regression, which are least squares estimates with the $L_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_1$$\end{document} norm penalty. The $L_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_1$$\end{document} penalty shrinks more weakly related coefficients to zero faster and results in sparse estimates. The Bayesian version of Lasso is later proposed by Park and Casella (Reference Park and Casella2008). From a Bayesian perspective, the Lasso estimates can be interpreted as the posterior modes with a Laplace prior assigned to all coefficients. Since the Lasso procedure imposes the same penalty for all coefficients, it may lead to appreciable bias for the resulting estimates. To solve this problem, Zou (Reference Zou2006) developed the adaptive Lasso procedure, which uses adaptive weights for penalizing different coefficients. The adaptive Lasso imposes relatively higher penalties for zero coefficients and lower penalties for nonzero coefficients, so it shrinks zero coefficients more efficiently and produces better estimation for nonzero coefficients than Lasso does. The Bayesian adaptive Lasso is proposed by Leng et al. (Reference Leng, Tran and Nott2014) with independent Laplace priors imposed on different coefficients. Furthermore, many other regularization methods have been studied with sparsity as a primary driving force (Fan & Li, Reference Fan and Li2001; Polson and Sokolov, Reference Polson and Sokolov2019; Tibshirani et al., 2021; Zhang, Reference Zhang2010).

Recently, the idea of regularization is introduced to the fields of psychometrics, clinical psychology, psychiatry and so on (Dwyer et al., Reference Dwyer, Falkai and Koutsouleris2018; Epskamp and Fried, Reference Epskamp and Fried2018). In addition to the regularization methods used in latent variable selection and DIF detection, regularization especially Bayesian regularization has been successfully developed in structural equation modeling (Chen et al., Reference Chen, Guo, Zhang and Pan2021; Huang, Reference Huang2018; Jacobucci et al., Reference Jacobucci, Grimm and Mcardle2016; Pan et al., Reference Pan, Ip and Dubé2017; Serang et al., Reference Serang, Jacobucci, Brimhall and Grimm2017). Compared with frequentist regularization, Bayesian regularization is highly efficient and easy to implement for complex models and data types (Alhamzawi et al., Reference Alhamzawi, Yu and Benoit2012; Feng et al., Reference Feng, Wu and Song2017). Due to the advantages of Bayesian adaptive Lasso, we use it for the simultaneous detection of item–trait relationship and DIF effects in MIRT models.

3.1. Bayesian Adaptive Lasso

In the framework of frequentist statistics, regularization is a general approach for reducing the complexity of a model for meaningful interpretation. By adding a penalty term to the usual likelihood, regularization approaches can shrink unimportant model parameters to exactly zero. Suppose the observed data are denoted by $y$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{y}}$$\end{document} , and the set of parameters in a model M is denoted by $β$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta }$$\end{document} with elements $β_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _k$$\end{document} $(k = 1, \dots, r)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(k=1,\cdots ,r)$$\end{document} . The adaptive Lasso approach uses the following objective function:

\begin{matrix} P L (β | M) = l o g (p (y | β, M)) + \sum_{k = 1}^{r} λ_{k} | β_{k} | = L L (β | M) + \sum_{k = 1}^{r} λ_{k} | β_{k} |, \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} PL(\varvec{\beta }|M)=log(p({\varvec{y}}|\varvec{\beta },M))+\sum _{k=1}^r\lambda _k|\beta _k|=LL(\varvec{\beta }|M)+\sum _{k=1}^r\lambda _k|\beta _k|, \end{aligned}$$\end{document}

where $P L (β | M)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$PL(\varvec{\beta }|M)$$\end{document} and $L L (β | M)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$LL(\varvec{\beta }|M)$$\end{document} are respectively the penalized and the usual log-likelihoods based on model M, and $λ_{k} \geq 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _k\ge 0$$\end{document} is a penalty parameter for $β_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _k$$\end{document} . A larger $λ_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _k$$\end{document} tends to increase the penalty for $β_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _k$$\end{document} . Using adaptive weights for penalizing different coefficients, the adaptive Lasso can shrink zero coefficients more efficiently and produce better estimation for nonzero coefficients than Lasso (Zou, Reference Zou2006).

The crucial quantity for Bayesian statistics is the posterior distribution $p (β | y, M) \propto p (y | β, M) \times p (β | M)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\varvec{\beta }|{\varvec{y}},M) \propto p({\varvec{y}}|\varvec{\beta },M) \times p(\varvec{\beta }|M)$$\end{document} , where $p (β | M)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\varvec{\beta }|M)$$\end{document} is the prior distribution. Compared with a frequentist approach, the prior $p (β | M)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\varvec{\beta }|M)$$\end{document} is an important connection to a regularization approach such as the adaptive Lasso. Following Leng et al. (Reference Leng, Tran and Nott2014), the adaptive Lasso estimates can be interpreted under the Bayesian framework when $β_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _k$$\end{document} s are assigned independent Laplace priors $\frac{λ_{k}}{2} e^{- λ_{k} | β_{k} |}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\lambda _k}{2}e^{-\lambda _k|\beta _k|}$$\end{document} . For a small value of $λ_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{k}$$\end{document} , the Laplace distribution is wide and no shrinkage is imposed. As the value of $λ_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{k}$$\end{document} increases, the probability density function tends to be more concentrated around zero, leading to a larger penalty (Pan et al., Reference Pan, Ip and Dubé2017). Moreover, the Bayesian framework provides a flexible way of estimating the penalty parameters, and hyperpriors can be used for the $λ_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _k$$\end{document} s. Specifically, $λ_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _k$$\end{document} s for Bayesian adaptive Lasso are assigned with the Gamma priors $λ_{k} \sim Gamma (α_{k 0}, δ_{k 0})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{k} \sim \textrm{Gamma}\left( \alpha _{k0}, \delta _{k0}\right) $$\end{document} $(k = 1, \dots, r)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(k=1,\cdots ,r)$$\end{document} , where $α_{k 0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{k0}$$\end{document} and $δ_{k 0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{k0}$$\end{document} are hyperparameters with pre-assigned values. Following the suggestions of previous studies (Brandt et al., 2023; Chen et al., Reference Chen, Bauer, Belzak and Brandt2022; Feng et al., Reference Feng, Wu and Song2017), dispersed hyperpriors are often adopted.

3.2. Bayesian Model Implementation

To implement Bayesian adaptive Lasso for identifying item–trait relationship and DIF effects, independent Laplace priors are assigned to the discriminations and DIF parameters of the last $J - K$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J-K$$\end{document} items. For the other parameters, commonly used priors are adopted for convenience. The priors and hyperpriors are given below.

For each element of the discrimination vectors $a_{K + 1}, \dots, a_{J}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}_{K+1},\cdots ,{\varvec{a}}_{J}$$\end{document} , independent Laplace priors are assigned and expressed as:

\begin{matrix} p (a_{K + 1}, \dots, a_{J}) \propto exp (- \sum_{j = K + 1}^{J} \sum_{k = 1}^{K} λ_{ajk} | a_{jk} |), \end{matrix}

where $λ_{ajk}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{ajk}$$\end{document} is the penalty parameter for $a_{jk}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{jk}$$\end{document} .

For each intercept $d_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{j}$$\end{document} $(j = 1, \dots, J)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(j=1,\cdots ,J)$$\end{document} , the normal prior is adopted as:

\begin{matrix} d_{j} \sim N (μ_{d j 0}, σ_{d j 0}^{2}), \end{matrix}

where $μ_{d j 0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{dj0}$$\end{document} and $σ_{d j 0}^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2_{dj0}$$\end{document} are hyperparameters with pre-assigned values, denoting the mean and variance of the normal distribution.

For each uniform DIF parameter in $β_{1}, \dots, β_{J}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta }_1,\cdots ,\varvec{\beta }_J$$\end{document} , independent Laplace priors can be expressed as:

\begin{matrix} p (β_{1}, \dots, β_{J}) \propto exp (- \sum_{j = 1}^{J} \sum_{p = 1}^{P} λ_{β j p} | β_{jp} |), \end{matrix}

where $λ_{β j p}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{\beta jp}$$\end{document} is the penalty parameter for $β_{jp}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{jp}$$\end{document} .

For each non-uniform DIF parameter in $γ_{1}, \dots, γ_{J}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\gamma }_1,\cdots ,\varvec{\gamma }_J$$\end{document} , independent Laplace priors can be expressed as:

\begin{matrix} p (γ_{1}, \dots, γ_{J}) \propto exp (- \sum_{j = 1}^{J} \sum_{p = 1}^{P} \sum_{k = 1}^{K} λ_{γ j p k} | γ_{jpk} |), \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p(\varvec{\gamma }_1,\cdots ,\varvec{\gamma }_J)\propto \exp \left( -\sum _{j=1}^{J}\sum _{p=1}^{P}\sum _{k=1}^{K}\lambda _{\gamma jpk}|\gamma _{jpk}|\right) , \end{aligned}$$\end{document}

where $λ_{γ j p k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{\gamma jpk}$$\end{document} is the penalty parameter for $γ_{jpk}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{jpk}$$\end{document} .

For the penalty parameters $λ_{ajk}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{ajk}$$\end{document} , $λ_{β j p}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{\beta jp}$$\end{document} and $λ_{γ j p k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{\gamma jpk}$$\end{document} , the Gamma priors can be assigned as:

\begin{matrix} \begin{matrix} λ_{ajk}^{2} \sim Gamma (α_{a j k 0}, δ_{a j k 0}), \\ λ_{β j p}^{2} \sim Gamma (α_{β j p 0}, δ_{β j p 0}), \\ λ_{γ j p k}^{2} \sim Gamma (α_{γ j p k 0}, δ_{γ j p k 0}), \end{matrix} \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \begin{aligned} \lambda _{ajk}^2 \sim \textrm{Gamma}\left( \alpha _{ajk0}, \delta _{ajk0}\right) ,\\ \lambda _{\beta jp}^2 \sim \textrm{Gamma}\left( \alpha _{\beta jp0}, \delta _{\beta jp0}\right) ,\\ \lambda _{\gamma jpk}^2 \sim \textrm{Gamma}\left( \alpha _{\gamma jpk0}, \delta _{\gamma jpk0}\right) , \end{aligned} \end{aligned}$$\end{document}

where $α_{a j k 0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{ajk0}$$\end{document} , $δ_{a j k 0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{ajk0}$$\end{document} , $α_{β j p 0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{\beta jp0}$$\end{document} , $δ_{β j p 0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{\beta jp0}$$\end{document} , $α_{γ j p k 0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{\gamma jpk0}$$\end{document} and $δ_{γ j p k 0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{\gamma jpk0}$$\end{document} are hyperparameters whose values are pre-assigned.

For each $Δ_{(k k 0)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta _{(kk0)}$$\end{document} $(j = 1, \dots, K)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(j=1,\cdots , K)$$\end{document} , the half-Cauchy prior is assigned as (Gelman, Reference Gelman2006):

\begin{matrix} Δ_{(k k 0)} \sim C^{+} (0, ι_{k 0}), \end{matrix}

where $ι_{k 0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\iota _{k0}$$\end{document} is the hyperparameter for the half-Cauchy distribution.

The LKJ correlation distribution is used for the prior of $Ω$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Omega }$$\end{document} with the density (Lewandowski et al., Reference Lewandowski, Kurowicka and Joe2009)

\begin{matrix} LkjCholesky (Ω) \propto det {(Ω)}^{ν - 1}, \end{matrix}

where $det (\cdot)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{det}(\cdot )$$\end{document} denotes the determinant and $ν$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu $$\end{document} is the shape parameter.

For each element $Υ_{kp}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Upsilon _{kp}$$\end{document} in $Υ_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Upsilon }_{k}$$\end{document} , the normal prior is adopted as:

\begin{matrix} Υ_{kp} \sim N (μ_{Υ_{kp} 0}, σ_{Υ_{kp} 0}^{2}), \end{matrix}

where $μ_{Υ_{kp} 0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{\Upsilon _{kp}0}$$\end{document} and $σ_{Υ_{kp} 0}^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2_{\Upsilon _{kp}0}$$\end{document} are hyperparameters with pre-assigned values.

For each element $η_{(k k) p}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta _{(kk)p}$$\end{document} in $η_{(k k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_{(kk)}$$\end{document} $(k = 1, \dots, K)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(k=1,\cdots ,K)$$\end{document} , the normal prior is adopted as:

\begin{matrix} η_{(k k) p} \sim N (μ_{η_{(k k) p} 0}, σ_{η_{(k k) p} 0}^{2}), \end{matrix}

where $μ_{η_{(k k) p} 0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{\eta _{(kk)p}0}$$\end{document} and $σ_{η_{(k k) p} 0}^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2_{\eta _{(kk)p}0}$$\end{document} are hyperparameters with pre-assigned values.

With the prior and hyperprior distributions given above, the posterior inference can be conducted by sampling from the joint posterior distribution, and the posterior means are used to estimate the unknown parameters. Though the joint posterior distribution is intractable in general, the Bayesian inference can be feasibly implemented in an available Bayesian software package, such as Stan (Carpenter et al., 2017) or Jags (Plummer, 2017). In our study, the rstan package (Carpenter et al., 2017; Stan Development, 2023) in R (R Core Team, 2022) was used to implement the Bayesian adaptive Lasso estimation. When posterior means are used as estimates, the Bayesian adaptive Lasso does not shrink any parameter to exactly zero, and a variable selection criterion should be applied for determining the significance of the unknown parameters. As proposed by Brandt et al. (2023), the 95% posterior credible intervals (CIs) were used in this paper.

4. A Simple Heuristic Example

In this section, a simple hypothetical example is provided to illustrate the motivation of our study. Consider two latent traits, each of which was measured by five test items. Two groups of persons were investigated and coded by a binary covariate, with 0 for the reference group and 1 for the focal group. The mean vector of latent traits in the reference group was set as ${(0, 0)}^{^{'}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(0,0)^{'}$$\end{document} , and the mean vector of latent traits in the focal group was set as ${(0.5, - 0.5)}^{^{'}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(0.5,-0.5)^{'}$$\end{document} . For both groups, the variances of latent traits were 1 and the correlation between latent traits was 0.5. The item intercepts were all set at 0 for simplicity, and the item discriminations were given as

\begin{matrix} (\begin{matrix} 1 & 0 & 1 & 1 & 1 & 1 & 0 & 0 & 0.3 & 0.3 \\ 0 & 1 & 0 & 0 & 0.3 & 0.3 & 1 & 1 & 1 & 1 \end{matrix}) . \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \left( \begin{array}{cccccccccc} 1 &{}\quad 0 &{}\quad 1 &{}\quad 1 &{}\quad 1 &{}\quad 1 &{}\quad 0 &{}\quad 0 &{}\quad 0.3 &{}\quad 0.3\\ 0 &{}\quad 1 &{}\quad 0 &{}\quad 0 &{}\quad 0.3 &{}\quad 0.3 &{}\quad 1 &{}\quad 1 &{}\quad 1 &{} 1\\ \end{array} \right) . \end{aligned}$$\end{document}

We assumed that items 4 and 8 had uniform DIF effects with $β_{41} = 0.3$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{41}=0.3$$\end{document} and $β_{81} = 0.3$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{81}=0.3$$\end{document} . The sample size was 500, divided evenly into the two groups. Data were generated with 50 replications.

Table 1 Mean absolute bias (CI coverage) of parameter estimates in the simple example

Our proposed model was compared with two alternative models. The first one used a confirmatory simple-structure MIRT model for DIF detection, with small cross-loadings fixed to 0; the second one first identified the item–trait structure by the EML1 method given by Sun et al. (Reference Sun, Chen, Liu, Ying and Xin2016), and then used the structure as confirmatory for DIF detection. In the three models, except for the item discriminations, the other model parameters were estimated using the same Bayesian priors.

To evaluate the performance of parameter estimation, the mean absolute bias and CI coverage were computed for each model, as shown in Table 1. The formal is the average absolute values of bias across converged replications and interested parameters. The latter is calculated as the number of converged replications where the equal-tailed 95% CIs covered the true values of the interested parameters divided by the total number of converged replications and interested parameters. From these results, we found that eliminating small cross-loadings in item–trait structure resulted in substantial bias in the estimates of item discriminations, DIF parameters and trait correlation as well as poor CI coverage. When the item–trait structure was first identified by the EML1 method, most mean absolute bias decreased and the CI coverage improved. Our proposed model performed best among the three models, with the smallest bias and highest CI coverage for item discriminations, DIF parameters and trait correlation.

5. Simulation Studies

Two simulation studies were conducted to evaluate the empirical performance of the Bayesian adaptive Lasso for uniform DIF (study 1) and non-uniform DIF (study 2) conditions. For both studies, the model defined in Eqs. (1)-(3) was used. The total number of items J was fixed at 15, and the number of latent traits K was fixed at 2. Table 2 gives the two discriminations for each item that reflected a common range of them, and the item intercepts were generated from the standard normal distribution (Wang et al., Reference Wang, Zhu and Xu2023). Four covariates $x_{i 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{i1}$$\end{document} , $x_{i 2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{i2}$$\end{document} , $x_{i 3}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{i3}$$\end{document} and $x_{i 4}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{i4}$$\end{document} were considered. $x_{i 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{i1}$$\end{document} and $x_{i 2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{i2}$$\end{document} , having DIF effects on some items, were independently generated from the standard normal distribution and the Bernoulli distribution with a success probability 0.5. $x_{i 3}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{i3}$$\end{document} and $x_{i 4}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{i4}$$\end{document} , having no DIF effects on any items, were jointly generated from a multivariate normal distribution with a mean vector $0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{0}}$$\end{document} and a correlation matrix with off-diagonal elements 0.5. The baseline means of latent traits were $α_{10} = α_{20} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{10}=\alpha _{20}=0$$\end{document} for identification, and the mean impacts were set at $Υ_{1} = {(0, 0.5, 0, 0)}^{^{'}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Upsilon }_{1}=(0,0.5,0,0)^{'}$$\end{document} and $Υ_{2} = {(0, - 0.5, 0, 0)}^{^{'}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Upsilon }_{2}=(0,-0.5,0,0)^{'}$$\end{document} , indicating the latent mean differences only related to the second covariate. The baseline standard deviations were set at $Δ_{(110)} = Δ_{(220)} = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta _{(110)}=\Delta _{(220)}=1$$\end{document} , and no standard deviation impacts were set with $η_{(11)} = η_{(22)} = {(0, 0, 0, 0)}^{^{'}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_{(11)}=\varvec{\eta }_{(22)}=(0,0,0,0)^{'}$$\end{document} . The correlation between two latent traits $Ω_{(12)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Omega }_{(12)}$$\end{document} was set at 0.5, reflecting a moderate degree of correlation.

Table 2 Simulated true item parameters

Three factors were manipulated: (a) the sample size N, (b) the percentage of DIF items, and (c) the magnitude of DIF. Two levels of sample size were evaluated: $N = 500$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=500$$\end{document} and $N = 1000$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=1000$$\end{document} , which was in line with previous studies (Sun et al., Reference Sun, Chen, Liu, Ying and Xin2016; Xu et al., Reference Xu, Shang, Zheng, Shan and Tang2022). Two percentages of DIF items ( $20 %$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$20\%$$\end{document} and $60 %$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$60\%$$\end{document} ) and two levels of the magnitude of DIF (small and large) were considered, and these choices were similar to the study of Wang et al. (Reference Wang, Zhu and Xu2023).

We evaluated the performance of our method in terms of (1) the accuracy of parameter estimation, (2) the correct rate (CR), false positive rate (FPR) and false negative rate (FNR) for latent variable selection, and (3) the true positive rate (TPR) and FPR for DIF detection. The results were computed on the basis of 50 replications for each condition. DIF effects were kept constant across replications with a given condition, which can avoid the mixture of within- and between-condition variability of DIF effects (Belzak and Bauer, Reference Belzak and Bauer2020; Wang et al., Reference Wang, Zhu and Xu2023). For the accuracy of parameter estimation, the mean-squared error (MSE) for each parameter is computed as

\begin{matrix} MSE (κ) = \frac{1}{Z} \sum_{z = 1}^{Z} {({\hat{κ}}^{z} - κ)}^{2}, \end{matrix}

where ${\hat{κ}}^{z}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\kappa }}^{z}$$\end{document} denotes an estimate of $κ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa $$\end{document} based on the zth converged replication, and Z is the number of converged replications. For summarizing our simulation results, MSE is displayed by each parameter type below. For example, the MSE for item discriminations is the average MSE of all estimated item discrimination parameters. The CR for latent variable selection is defined by the recovery of the unknown elements in the incidence matrix $Ξ = (ξ_{jk})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Xi =(\xi _{jk})$$\end{document} , where $ξ_{jk} = I (a_{jk} \neq 0)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi _{jk}=I(a_{jk}\ne 0)$$\end{document} , and it is given as

\begin{matrix} CR = \frac{1}{Z (J - K) K} \sum_{z = 1}^{Z} \sum_{j = K + 1}^{J} \sum_{k = 1}^{K} I ({\hat{ξ}}_{jk}^{z} = ξ_{jk}), \end{matrix}

where ${\hat{ξ}}_{jk}^{z}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\xi }}_{jk}^{z}$$\end{document} is an estimate of the true $ξ_{jk}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi _{jk}$$\end{document} based on the zth converged replication. FPR for latent variable selection refers to the ratio of incorrectly detected nonzero incidence relations among all true zero incidence relations and converged replications, and the FNR for latent variable selection refers to the ratio of incorrectly detected zero incidence relations among all true nonzero incidence relations and converged replications. For DIF detection, in order to avoid the mixture of the impact of different covariates on DIF, TPR and FPR are calculated in terms of item–covariate combinations (Chen et al., Reference Chen, Bauer, Belzak and Brandt2022; Schauberger and Mair, Reference Schauberger and Mair2020). Specifically, TPR and FPR are calculated as the proportions of item–covariate combinations in which a covariate is detected as having significant uniform or non-uniform DIF parameters for an item across all converged replications and item–covariate combinations that do or do not have DIF, respectively.

In the simulation studies, data generation and parameter estimation were all implemented in R statistical programming software. The R codes are available at https://github.com/Shann285/LdDIFMIRT. We ran all R codes on the Windows 10 64-bit platform with an Inter(R) Core(TM) i9-9900 CPU at 3.10 GHz and 32 GB memory. Our Bayesian models were fitted with 3 chains of Hamiltonian Markov Chain Monte Carlo (MCMC) samples using the R package rstan. Each Hamiltonian MCMC chain had 4000 iterations with the first 2000 iterations as a burn-in period. The convergence of the chains was monitored by zero divergent transitions in the sampling process and “Rhat” indices less than 1.05. The convergence rates varied depending on the data and prior assignments, and they can be improved by adding the number of iterations and thinning (Chen et al., Reference Chen, Bauer, Belzak and Brandt2022).

5.1. Simulation Study 1

In this study, only uniform DIF effects were considered, i.e., all $γ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document} coefficients were fixed at 0. The $β$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document} coefficients were set as 0.3 and 0.6 for small and large magnitude of DIF, respectively. Specifically, $β_{41}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{41}$$\end{document} , $β_{82}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{82}$$\end{document} , $β_{13, 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{13,1}$$\end{document} and $β_{13, 2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{13,2}$$\end{document} were equal to 0.3 (or 0.6) for the $20 %$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$20\%$$\end{document} DIF condition, and $β_{31}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{31}$$\end{document} , $β_{41}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{41}$$\end{document} , $β_{51}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{51}$$\end{document} , $β_{72}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{72}$$\end{document} , $β_{82}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{82}$$\end{document} , $β_{92}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{92}$$\end{document} , $β_{12, 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{12,1}$$\end{document} , $β_{12, 2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{12,2}$$\end{document} , $β_{13, 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{13,1}$$\end{document} , $β_{13, 2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{13,2}$$\end{document} , $β_{14, 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{14,1}$$\end{document} and $β_{14, 2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{14,2}$$\end{document} were equal to 0.3 (or 0.6) for the $60 %$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$60\%$$\end{document} DIF condition. These choices were similar to those used in Wang et al. (Reference Wang, Zhu and Xu2023).

Following the suggestions of the existing literature (Feng et al., Reference Feng, Wu and Song2017; Pan et al., Reference Pan, Ip and Dubé2017; Chen et al., Reference Chen, Bauer, Belzak and Brandt2022; Brandt et al., 2023), the prior and hyperprior distributions were chosen as follows: the normal priors used the normal distribution $N (0, 2^{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N(0, 2^2)$$\end{document} , the hyperpriors for the penalty parameters were the Gamma distribution $Gamma (9, 3)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{Gamma}(9,3)$$\end{document} , the half-Cauchy distribution was $C^{+} (0, 2.5)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C^{+}(0,2.5)$$\end{document} , and the LKJ correlation distribution was set with $ν = 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu =2$$\end{document} . The initial values were generated similarly to those used in Chen et al. (Reference Chen, Bauer, Belzak and Brandt2022). DIF in an item due to a specific covariate was assumed if the 95% CI for the respective element in $β$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document} did not include zero.

The Bayesian adaptive Lasso for uniform DIF detection achieved preferable convergence rates, all above 95%. Though non-convergence might be modified with further adjustments, we did not do this and simply used the converged replications for the results. The running times for different conditions varied, mainly depending on the sample size. For the sample size $N = 500$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=500$$\end{document} , the average CPU times were less than 1000 s for each condition. The average CPU times increased to more than 2000 s when the sample size was $N = 1000$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=1000$$\end{document} . The specific values of the average CPU times are shown in Table 8 of Appendix A.

Figure 1 shows MSE as a combination of squared bias and variance for estimating item discriminations a, item intercepts d, uniform DIF parameters $β$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document} , mean impacts $Υ_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Upsilon _k$$\end{document} , baseline standard deviations $Δ_{(k k 0)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta _{(kk0)}$$\end{document} , standard deviation impacts $η_{(k k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta _{(kk)}$$\end{document} , and the correlation between latent traits denoted as $Ω_{(12)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Omega _{(12)}$$\end{document} . And the MSEs for each parameter estimate are provided in Table 9 of Appendix A. We found that most model parameters could be recovered well. When DIF percentage was $60 %$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$60\%$$\end{document} , the bias of most estimates increased. In contrast, the magnitude of DIF had little influence on the estimates. The estimates of mean impacts, baseline standard deviations, standard deviation impacts and the correlation changed little under different percentage and magnitude of DIF effects. As sample size increased, most MSEs reduced. The CI coverage rates were calculated to evaluate the uncertainty of the estimates for the population parameters, as shown in Fig. 2. The coverage rates under all conditions were above 80%, which were similar to the results of Brandt et al. (2023) and Chen et al. (Reference Chen, Bauer, Belzak and Brandt2022).

Figure 1 MSEs of the model parameter estimates in study 1.

Figure 2 CI coverage for different parameters in study 1.

Table 3 summarizes the results for recovering the incidence matrix and detecting DIF effects over 50 independent datasets in each simulated condition. For the incidence matrix $Ξ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Xi $$\end{document} , the CRs, FPRs and FNRs were calculated in each condition. The CRs were all above 0.98, and the FPRs and FNRs did not exceed 0.05. The percentage and magnitude of DIF had little impacts on the recovery of $Ξ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Xi $$\end{document} . For DIF detection, TPRs and FPRs are shown at the bottom of Table 3. Consistent with previous research (Belzak and Bauer, Reference Belzak and Bauer2020; Schauberger and Mair, Reference Schauberger and Mair2020), small magnitude of DIF effects was difficult to detect. The TPRs reduced when DIF percentage was $60 %$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$60\%$$\end{document} . All TPRs grew as sample size increased. Our method produced acceptable FPRs for any study conditions, which was in concert with previous findings that regularization methods have good control of type I errors (Brandt et al., 2023; Chen et al., Reference Chen, Bauer, Belzak and Brandt2022; Wang et al., Reference Wang, Zhu and Xu2023).

Table 3 Results of latent variable selection and DIF detection in study 1

5.2. Simulation Study 2

In the second study, non-uniform DIF effects were evaluated for DIF items. The item parameters were the same as simulation study 1. For DIF parameters, $β$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document} were set as 0.3 and 0.6 for small and large magnitude DIF, and $γ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document} were set as $- 0.3$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.3$$\end{document} and $- 0.6$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.6$$\end{document} for small and large magnitude DIF. For the $20 %$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$20\%$$\end{document} DIF condition, $β_{41}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{41}$$\end{document} , $β_{82}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{82}$$\end{document} , $β_{13, 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{13,1}$$\end{document} and $β_{13, 2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{13,2}$$\end{document} were equal to 0.3 (or 0.6), and $γ_{411}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{411}$$\end{document} , $γ_{822}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{822}$$\end{document} , $γ_{13, 11}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{13,11}$$\end{document} and $γ_{13, 22}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{13,22}$$\end{document} were equal to $- 0.3$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.3$$\end{document} (or $- 0.6$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.6$$\end{document} ). For the $60 %$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$60\%$$\end{document} DIF condition, $β_{31}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{31}$$\end{document} , $β_{41}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{41}$$\end{document} , $β_{51}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{51}$$\end{document} , $β_{72}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{72}$$\end{document} , $β_{82}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{82}$$\end{document} , $β_{92}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{92}$$\end{document} , $β_{12, 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{12,1}$$\end{document} , $β_{12, 2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{12,2}$$\end{document} , $β_{13, 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{13,1}$$\end{document} , $β_{13, 2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{13,2}$$\end{document} , $β_{14, 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{14,1}$$\end{document} and $β_{14, 2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{14,2}$$\end{document} were equal to 0.3 (or 0.6), and $γ_{311}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{311}$$\end{document} , $γ_{411}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{411}$$\end{document} , $γ_{511}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{511}$$\end{document} , $γ_{722}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{722}$$\end{document} , $γ_{822}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{822}$$\end{document} , $γ_{922}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{922}$$\end{document} , $γ_{12, 11}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{12,11}$$\end{document} , $γ_{12, 22}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{12,22}$$\end{document} , $γ_{13, 11}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{13,11}$$\end{document} , $γ_{13, 22}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{13,22}$$\end{document} , $γ_{14, 11}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{14,11}$$\end{document} and $γ_{14, 22}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{14,22}$$\end{document} were equal to $- 0.3$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.3$$\end{document} (or $- 0.6$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.6$$\end{document} ). The above choices were similar to the study of Wang et al. (Reference Wang, Zhu and Xu2023).

Different from the uniform DIF models considered in study 1, the non-uniform DIF models had unknown $γ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document} coefficients for non-anchor items. Since the non-uniform DIF models had more DIF parameters than the uniform DIF models, stronger penalty was used to achieve adequate convergence rates. The hyperpriors of the penalty parameters were set with $Gamma (27, 3)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{Gamma}(27,3)$$\end{document} . The other priors and initial values were the same as those used in study 1.

The Bayesian adaptive Lasso for non-uniform DIF models achieved reasonable convergence rates, ranging from 84% to 98% with an average above 92%. Convergence below 90% occurred in the small DIF magnitude and high DIF percentage conditions, which were in concert with Chen et al. (Reference Chen, Bauer, Belzak and Brandt2022). Only the converged replications were used for evaluating the estimated results. The running times for non-uniform DIF models all exceeded 3500 s. For the sample size $N = 500$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=500$$\end{document} , the average CPU times were about an hour. When the sample size was $N = 1000$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=1000$$\end{document} , the average CPU times were nearly two hours. The average CPU times for non-uniform DIF conditions are also shown in Table 8 of Appendix A.

Figure 3 shows the MSEs of the estimated parameters for non-uniform DIF conditions. And the MSEs for each parameter estimate can be found at https://github.com/Shann285/LdDIFMIRT. The MSEs of item discriminations were larger than those of other parameters and the MSEs in study 1, reflecting the increased uncertainty due to the unknown $γ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document} coefficients. The bias of most estimates increased when DIF percentage was $60 %$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$60\%$$\end{document} . The magnitude of DIF had no obvious impact on the estimates when DIF percentage was low, but may lead to larger bias when DIF was pervasive. Most MSEs reduced as sample size increased. The CI coverage rates for all conditions in study 2 were above 80%, as shown in Fig. 4.

Figure 3 MSEs of the model parameter estimates in study 2.

Figure 4 CI coverage for different parameters in study 2.

Table 4 presents the results for the incidence matrix and DIF detection under the non-uniform DIF conditions. For recovering the incidence matrix, the overall CRs were above $94 %$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$94\%$$\end{document} and the FPRs did not exceed 0.02, but the FNRs were slightly above 0.05 when the DIF percentage was $60 %$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$60\%$$\end{document} . Small magnitude of DIF effects was more difficult to detect in the non-uniform DIF conditions, and this may be due to the increased number of model parameters. In contrast, the TPRs for large DIF conditions were larger than those in study 1, indicating that large $γ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document} coefficients were helpful for identifying DIF items. Similar to the results of study 1, the TPRs reduced when DIF was pervasive, and all TPRs grew as sample size increased. We had acceptable control of the FPRs, slightly exceeding 0.05 in the $60 %$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$60\%$$\end{document} DIF percentage.

Table 4 Results of latent variable selection and DIF detection in study 2

6. Real Data Analysis

A real data set from the Eysenck Personality Questionnaire (EPQ) data given in Eysenck and Barrett (Reference Eysenck and Barrett2013) was used to further illustrate the performance of our method. Three factors of Psychoticism (P), Extraversion (E) and Neuroticism (N) were initially investigated by Xu et al. (Reference Xu, Shang, Zheng, Shan and Tang2022) from this data. Since the psychometric weaknesses in the P scale of the EPQ, only Extraversion (E) and Neuroticism (N) were focused in our analysis. In line with Xu et al. (Reference Xu, Shang, Zheng, Shan and Tang2022), two items in E were deleted, because their corrected item–total correlation values were less than 0.2 (Kline, 1986). As a result, 42 items were selected, including 19 items corresponding to E and 23 items corresponding to N. The initial design of EPQ is confirmative and each item is associated to only one factor. The used items and their original indices are listed in Table 5.

Table 5 The Eysenck Personality Questionnaire with items for E and N

“R" Denotes the negatively worded items in the original questionnaire.

Two covariates were considered for detecting DIF effects, among of which age was a continuous variable representing age of the person and gender was a binary categorical variable. Moreover, only the ages of 18, 19, 20 and 21 were included, since the number of persons in other age groups was small. After eliminating persons with missing data, our analysis was based on 843 individuals from Canada. The model defined in Eqs. (1)-(3) with $K = 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K = 2$$\end{document} and $P = 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P = 2$$\end{document} was applied to analyze the real data. In addition, the person-specific correlation between latent traits was modeled as

\begin{matrix} Ω_{(12) i} = \frac{exp (2 (ω_{(12) 0} + ω_{(12)}^{^{'}} x_{i})) - 1}{exp (2 (ω_{(12) 0} + ω_{(12)}^{^{'}} x_{i})) + 1}, \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \varvec{\Omega }_{(12)i}=\frac{\exp (2(\omega _{(12)0}+\varvec{\omega }_{(12)}^{'}{\varvec{x}}_{i}))-1}{\exp (2(\omega _{(12)0}+\varvec{\omega }_{(12)}^{'}{\varvec{x}}_{i}))+1}, \end{aligned}$$\end{document}

which indicated that the Fisher’s z-transformation of $Ω_{(12) i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Omega }_{(12)i}$$\end{document} was a linear function of $x_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{x}}_{i}$$\end{document} (Bauer, Reference Bauer2017). The priors of the above model parameters also used the normal distribution $N (0, 2^{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N(0, 2^2)$$\end{document} , and the other priors and initial values for the real data analysis were similar to those used in the simulation studies. For model identification, following the study of Xu et al. (Reference Xu, Shang, Zheng, Shan and Tang2022), items 1 and 20 were designated for E and N separately, and they were assumed as DIF-free items. Both uniform and non-uniform DIF models were fitted, respectively. For each model estimation, three chains of Hamiltonian MCMC samples were used, and each chain had 5000 iterations with the first 2500 iterations as burn-in. The convergence of the chains was checked. We only reported the results for uniform DIF detection, as the vast majority of the $γ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document} coefficients in the non-uniform DIF model were not significant. The running time for the uniform DIF detection was nearly two hours.

Table 6 shows the estimated item discriminations and intercepts after rescaling the baseline latent trait variances to be unity. We found that most items remained associated with one single trait. There are more items associated with both latent traits than those found in Xu et al. (Reference Xu, Shang, Zheng, Shan and Tang2022), and most of the cross-loadings were sensible. For example, item 5 (E21) was also related to neuroticism, the same as the results of Sun et al. (Reference Sun, Chen, Liu, Ying and Xin2016) and Xu et al. (Reference Xu, Shang, Zheng, Shan and Tang2022). Item 9 (E36) and item 15 (E60) were also related to neuroticism, and these were in line with Xu et al. (Reference Xu, Shang, Zheng, Shan and Tang2022). Item 27 (N31) was also related to extraversion and it was consistent with Sun et al. (Reference Sun, Chen, Liu, Ying and Xin2016). Moreover, item 39 (N77 ‘Do you often feel lonely?’) was newly found to be related to both extraversion and neuroticism, which was in accordance with Buecker et al.’s (Reference Buecker, Maes, Denissen and Luhmann2020) findings that extraversion and neuroticism were significantly related to loneliness, and the average lonely person was rather introverted and neurotic than the average non-lonely person. Moreover, the mean impact of age on the trait N was $-$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document} 0.243, indicating that the average neuroticism in males was significantly lower than that in females. But the other impacts were not significant.

Table 6 The estimated item discriminations and intercepts for the real data

$*$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$*$$\end{document} denotes the significance of item discriminations.

For DIF detection, our results are compared with a commonly used IRT-LR test (Suh and Cho, Reference Suh and Cho2014), where both age and gender were considered as grouping covariates. The IRT-LR test was implemented by the R package mirt, and items 1 and 20 were also assigned as anchor items for DIF detection in IRT-LR. The results of Bayesian adaptive Lasso and IRT-LR for DIF detection are provided in Table 7. Most DIF items identified by Bayesian adaptive Lasso were also identified as DIF by IRT-LR. In addition, IRT-LR identified more DIF items, especially for gender. As pointed out by previous studies (Belzak and Bauer, Reference Belzak and Bauer2020; Wang et al., Reference Wang, Zhu and Xu2023), IRT-LR leads to high false positive rates when DIF is pervasive.

Table 7 Comparisons of DIF detection for BaLasso and IRT-LR in the real data

✓ and ✗ denote the items identified as DIF free and DIF for the corresponding covariates, respectively.

7. Discussion

Regularization methods for latent variable selection or DIF detection come into use about a decade ago. For either of the two purposes, regularization methods often outperform the corresponding conventional methods. In frequentist statistics, the success of the regularization methods depends on choosing the regularization (penalty) parameters, and some criteria, such as Bayesian information criterion (BIC) and cross-validation (CV), can be used to select the optimal regularization parameters for model fitting. From the view of Bayesian statistics, the regularization parameters can be considered as random and assigned with appropriate prior distributions. By incorporating DIF-inducing covariates in MIRT models, we propose a Bayesian adaptive Lasso approach for simultaneously detecting item–trait relationship and DIF effects.

Our simulation studies showed that our proposed method can produce good parameter estimates, and performed well for the recovery of item–trait relationship. For uniform DIF detection, our method had acceptable TPRs and good control of FPRs. These results are similar to the studies of Bauer et al. (Reference Bauer, Belzak and Cole2020) and Wang et al. (Reference Wang, Zhu and Xu2023). For non-uniform DIF detection, FPRs inflated a little than the uniform DIF conditions, since the non-uniform DIF models include more model parameters to be estimated. Moreover, it should be noted that both Tables 3 and 5 show slightly increased FPRs when the sample size increased. Though this phenomenon is similar to some existing researches (Belzak and Bauer, Reference Belzak and Bauer2020; Brandt et al., 2023; Chen et al., Reference Chen, Bauer, Belzak and Brandt2022; Wang et al., Reference Wang, Zhu and Xu2023), the model (variable) selection consistency in latent variable models, especially in item response theory models, needs to be further investigated and theoretically justified in future. In addition, the DIF effects of multiple covariates are simultaneously detected in a multidimensional latent trait model, and it is beneficial for alleviating the problems caused by multicollinearity, where using a method repeatedly for different covariates is not appropriate.

It is meaningful to investigate how our methods perform when no DIF effects exist. Using the same data generation settings as in our simulation studies except for the zero DIF effects, 50 replications were generated with the sample size $N = 500$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=500$$\end{document} . With the same priors and initial values as those in the simulation studies, the uniform and non-uniform DIF models were fitted, respectively. The convergence rates were 98% and 94% for the uniform and non-uniform DIF models. The MSE compositions and CI coverage rates are shown in Fig. 5 of Appendix A, which indicated good recovery of the model parameters. The CRs, FPRs and FNRs for the incidence matrix were satisfactory, with 0.982, 0.038 and 0.009 for uniform DIF model, and 0.976, 0.003 and 0.033 for non-uniform DIF model. Two models produced well-behaved FPRs for DIF detection, with 0.020 and 0.030 for the uniform and non-uniform DIF models, respectively.

The current study also has some limitations and can be further improved in several aspects. First, our models are complex, especially for the non-uniform DIF conditions. Their computational costs using Bayesian adaptive Lasso are high and the running times are long. It will be important to improve the computational efficiency of our procedures. Second, in order to distinguish different latent traits and place different persons on a common metric, we need to designate K DIF-free items, loading on one dimension separately. These constraints are based on empirical knowledge of the items and may affect the estimation results. When DIF percentage is high, finding the right anchor items may not be easy (Wang et al., Reference Wang, Zhu and Xu2023). Third, our method can be developed easily to allow for the inclusion of missing data. Standard DIF detection methods are sensitive to missing data and the results for DIF detection are affected by different imputation methods. However, Bayesian method can handle missing data by sampling from posterior distribution, and no imputation is needed. Fourth, other penalty functions or regularized priors can be studied. Several nonconvex penalties, such as SCAD (smoothly clipped absolute deviation; Fan & Li, Reference Fan and Li2001) and MCP (minimax concave penalty; Zhang, Reference Zhang2010), are well-known. But their performance for simultaneously detecting item–trait relationship and DIF effects is lack of an in-depth study. Furthermore, regularized priors with different types of mixture distributions should be thoroughly investigated. Finally, since the indeterminacy of item–trait relationship, the interactions involving latent traits are very complicated. Further studies for distinguishing the incidence of latent traits and the discriminatory power of covariates need to be investigated.

Acknowledgements

We thank the editor, associate editor, and three anonymous referees for their careful review and valuable comments. This research is partially supported by the National Natural Science Foundation of China (No. 11871013) and the Natural Science Foundation of Jilin Province (No. 20210101152JC).

Author Contributions

Na Shan contributed to conceptualization, methodology, writing—original draft, and writing—review and editing. Ping-Feng Xu contributed to supervision and methodology.

Data Availability

Data sharing is not applicable to this paper as no new data were created or analyzed in this study.

Declarations

Conflict of interest

All authors declare no conflict of interest.

APPENDIX

Appendix A. Additional tables and figures.

Table 8 Average CPU times in seconds for all conditions in studies 1 and 2

Figure 5 CI coverage and MSE decompositions for the simulation with no DIF effects and sample size 500.

Table 9 MSEs for each parameter estimate in study 1

Footnotes

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

References

Alhamzawi, R., Yu, K., Benoit, D. F. (2012). Bayesian adaptive Lasso quantile regression. Statistical Modelling, 12(3), 279–297.CrossRef Google Scholar

Asparouhov, T., Muthén, B. (2009). Exploratory structural equation modeling. Structural Equation Modeling, 16, 397–438.CrossRef Google Scholar

Bauer, D. J. (2017). A more general model for testing measurement invariance and differential item functioning. Psychological Methods, 22(3), 507–526.CrossRef Google Scholar PubMed

Bauer, D. J., Belzak, W. C. M., Cole, V. T. (2020). Simplifying the assessment of measurement invariance over multiple background variables: Using regularized moderated nonlinear factor analysis to detect differential item functioning. Structural Equation Modeling: A Multidisciplinary Journal, 27(1), 43–55.CrossRef Google Scholar PubMed

Béguin, A. A., Glas, C. A. W. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66(4), 541–561.CrossRef Google Scholar

Belzak, W. C. M., Bauer, D. J. (2020). Improving the assessment of measurement invariance: Using regularization to select anchor items and identify differential item functioning. Psychological Methods, 25(6), 673–690.CrossRef Google Scholar PubMed

Bock, R. D., Gibbons, R., Muraki, E. (1988). Full-information item factor analysis. Applied Psychological Measurement, 12(3), 261–280.CrossRef Google Scholar

Brandt, H., Chen, S. M., & Bauer, D. J. (2023). Bayesian penalty methods for evaluating measurement invariance in moderated nonlinear factor analysis. Psychological Methods. https://doi.org/10.1037/met0000552.CrossRef Google Scholar

Browne, M. W., Cudeck, R. (1989). Single sample cross-validation indices for covariance structures. Multivariate Behavioral Research, 24(4), 445–455.CrossRef Google Scholar PubMed

Buecker, S., Maes, M., Denissen, J. J. A., Luhmann, M. (2020). Loneliness and the big five personality traits: A meta-analysis. European Journal of Personality, 34, 8–28.CrossRef Google Scholar

Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., & Riddell, A. (2017). Stan: A probabilistic programming language. Journal of Statistical Software, 76(1), 1–32. https://doi.org/10.18637/jss.v076.i01.CrossRef Google Scholar

Chen, J. S. (2020). A partially confirmatory approach to the multidimensional item response theory with the Bayesian Lasso. Psychometrika, 85(3), 738–774.CrossRef Google Scholar

Chen, J. S., Guo, Z. H., Zhang, L. J., Pan, J. H. (2021). A partially confirmatory approach to scale development with the Bayesian Lasso. Psychological Methods, 26(2), 210–235.CrossRef Google Scholar

Chen, S. M., Bauer, D. J., Belzak, W. M., Brandt, H. (2022). Advantages of spike and slab priors for detecting differential item functioning relative to other Bayesian regularizing priors and frequentist Lasso. Structural Equation Modeling: A Multidisciplinary Journal, 29(1), 122–139.CrossRef Google Scholar

da Silva, M. A., Liu, R., Huggins-Manley, A. C., Bazán, J. L. (2019). Incorporating the Q-Matrix into multidimensional item response theory models. Educational and Psychological Measurement, 79(4), 665–687.CrossRef Google Scholar PubMed

Dwyer, D. B., Falkai, P., Koutsouleris, N. (2018). Machine learning approaches for clinical psychology and psychiatry. Annual Review of Clinical Psychology, 14, 91–118.CrossRef Google Scholar PubMed

Epskamp, S., Fried, E. I. (2018). A tutorial on regularized partial correlation networks. Psychological Methods, 23(4), 617–634.CrossRef Google Scholar PubMed

Eysenck, S., Barrett, P. (2013). Re-introduction to cross-cultural studies of the EPQ. Personality and Individual Differences, 54(4), 485–489.CrossRef Google Scholar

Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.CrossRef Google Scholar

Feng, X. N., Wu, H. T., Song, X. Y. (2017). Bayesian adaptive Lasso for ordinal regression with latent variables. Sociological Methods and Research, 46(4), 926–953.CrossRef Google Scholar

Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis, 1(3), 515–534.CrossRef Google Scholar

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer.Google Scholar

Huang, P. H. (2018). A penalized likelihood method for multi-group structural equation modelling. British Journal of Mathematical and Statistical Psychology, 71(3), 499–522.CrossRef Google Scholar PubMed

Huang, P. H., Chen, H., Weng, L. J. (2017). A penalized likelihood method for structural equation modeling. Psychometrika, 82(2), 329–354.CrossRef Google Scholar PubMed

Jacobucci, R., Grimm, K. J., Mcardle, J. J. (2016). Regularized structural equation modeling. Structural Equation Modeling: A Multidisciplinary Journal, 23(4), 555–566.CrossRef Google Scholar PubMed

Janssen, R., De Boeck, P. (1999). Confirmatory analyses of componential test structure using multidimensional item response theory. Multivariate Behavioral Research, 34(2), 245–268.CrossRef Google Scholar PubMed

Jin, K. Y., Wang, W. C. (2014). Item response theory models for performance decline during testing. Journal of Educational Measurement, 51(2), 178–200.CrossRef Google Scholar

Kim, J., Oshima, T. C. (2013). Effect of multiple testing adjustment in differential item functioning detection. Educational and Psychological Measurement, 73(3), 458–470.CrossRef Google Scholar

Kline, P. (1986). A handbook of test construction: Introduction to psychometric design. Methuen.Google Scholar

Lee, S., Bulut, O., Suh, Y. (2017). Multidimensional extension of multiple indicators multiple causes models to detect DIF. Educational and Psychological Measurement, 77(4), 545–569.CrossRef Google Scholar PubMed

Leng, C., Tran, M. N., Nott, D. (2014). Bayesian adaptive Lasso. Annals of the Institute of Statistical Mathematics, 66(2), 221–244.CrossRef Google Scholar

Lewandowski, D., Kurowicka, D., Joe, H. (2009). Generating random correlation matrices based on vines and extended onion method. Journal of Multivariate Analysis, 100(9), 1989–2001.CrossRef Google Scholar

Magis, D., Tuerlinckx, F., De Boeck, P. (2015). Detection of differential item functioning using the Lasso approach. Journal of Educational and Behavioral Statistics, 40(2), 111–135.CrossRef Google Scholar

Mazor, K. M., Hambleton, R. K., Clauser, B. E. (1998). Multidimensional DIF analyses: The effects of matching on unidimensional subtest scores. Applied Psychological Measurement, 22(4), 357–367.CrossRef Google Scholar

Mckinley, R. (1989). Confirmatory analysis of test structure using multidimensional item response theory. Technical Report No. RR-89-31. Educational Testing Service.CrossRef Google Scholar

Millsap, R. E., Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17(4), 297–334.CrossRef Google Scholar

Oshima, T. C., Raju, N. S., Flowers, C. P. (1997). Development and demonstration of multidimensional IRT-based internal measures of differential functioning of items and tests. Journal of Educational Measurement, 34(3), 253–272.CrossRef Google Scholar

Pan, J., Ip, E. H., Dubé, L. (2017). An alternative to post hoc model modification in confirmatory factor analysis: The Bayesian Lasso. Psychological Methods, 22(4), 687–704.CrossRef Google Scholar PubMed

Park, T., Casella, G. (2008). The Bayesian Lasso. Journal of the American Statistical Association, 103(482), 681–686.CrossRef Google Scholar

Plummer, M. (2017). Jags version 4.3.0 user manual. https://sourceforge.net/projects/mcmc-jags/files/Manuals/4.x/.Google Scholar

Polson, N. G., Sokolov, V. (2019). Bayesian regularization: From Tikhonov to horseshoe. WIREs Computational Statistics, 11.CrossRef Google Scholar

R Core Team. (2022). R: A language and environment for statistical computing. R Foundation for statistical computing, Vienna, Austria. https://www.R-project.org.Google Scholar

Reckase, M. D. (2009). Multidimensional item response theory, Springer.CrossRef Google Scholar

Schauberger, G., Mair, P. (2020). A regularization approach for the detection of differential item functioning in generalized partial credit models. Behavior Research Methods, 52(1), 279–294.CrossRef Google Scholar PubMed

Serang, S., Jacobucci, R., Brimhall, K. C., Grimm, K. J. (2017). Exploratory mediation analysis via regularization. Structural Equation Modeling: A Multidisciplinary Journal, 24(5), 733–744.CrossRef Google Scholar PubMed

Stan Development Team. (2023). RStan: The R interface to Stan [R package version 2.21.8]. http://mc-stan.org/.Google Scholar

Stout, W., Li, H., Nandakumar, R., Bolt, D. (1997). MULTISIB - a procedure to investigate DIF when a test is intentionally multidimensional. Applied Psychological Measurement, 21(3), 195–213.CrossRef Google Scholar

Suh, Y., Cho, S. J. (2014). Chi-square difference tests for detecting functioning in a multidimensional IRT model: A Monte Carlo study. Applied Psychological Measurement, 38(5), 359–375.CrossRef Google Scholar

Sun, J., Chen, Y., Liu, J., Ying, Z., Xin, T. (2016). Latent variable selection for multidimensional item response theory models via

L_{1}

regularization. Psychometrika, 81(4), 921–939.CrossRef Google Scholar

Teresi, J. A., Ramirez, M., Lai, J., Silver, S. (2008). Occurrences and sources of differential item functioning (DIF) in patient-reported outcome measures: Description of DIF methods, and review of measures of depression, quality of life and general health. Psychology Science, 50(4), 538–612.Google Scholar PubMed

Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288.CrossRef Google Scholar

Tibshirani, R., Friedman, J., Hastie, T., Narasimhan, B., Simon, N., & Qian, J. (2021). glmnet: Lasso and elastic-net regularized generalized linear models. https://www.rdocumentation.org/packages/glmnet/versions/4.1-3.Google Scholar

Tutz, G., Schauberger, G. (2015). A penalty approach to differential item functioning in Rasch models. Psychometrika, 80(1), 21–43.CrossRef Google Scholar PubMed

Wang, C., Zhu, R. Y., Xu, G. J. (2023). Using Lasso and adaptive Lasso to identify DIF in multidimensional 2PL models. Multivariate Behavioral Research, 58(2), 387–407.CrossRef Google Scholar PubMed

Wellner, J., Zhang, T. (2012). Introduction to the special issue on sparsity and regularization methods. Statistical Science, 27(4), 447–449.CrossRef Google Scholar

Woods, C. M. (2009). Empirical selection of anchors for tests of differential item functioning. Applied Psychological Measurement, 33(1), 42–57.CrossRef Google Scholar

Xu, P. F., Shang, L., Zheng, Q. Z., Shan, N., Tang, M. L. (2022). Latent variable selection in multidimensional item response theory models using the expectation model selection algorithm. British Journal of Mathematical and Statistical Psychology, 75(2), 363–394.CrossRef Google Scholar PubMed

Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 38(2), 894–942.CrossRef Google Scholar

Zhang, S., Chen, Y. (2022). Computation for latent variable model estimation: A unified stochastic proximal framework. Psychometrika, 87(4), 1473–1502.CrossRef Google Scholar PubMed

Zou, H. (2006). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418–1429.CrossRef Google Scholar