Differential Item Functioning via Robust Scaling

Peter F. Halpin

doi:10.1007/s11336-024-09957-6

Differential Item Functioning via Robust Scaling

Published online by Cambridge University Press: 01 January 2025

Peter F. Halpin

Show author details

Peter F. Halpin*: Affiliation:
University of North Carolina at Chapel Hill
*: Correspondence should be made to Peter F. Halpin, University of North Carolina at Chapel Hill, 100 E Cameron Ave, Office 1070G, Chapel Hill, NC 27514, USA. Email: [email protected]

Article contents

Abstract
The Circular Nature of DIF: Redux
The R-DIF Procedure
Robustness
Estimation
Extensions
Numerical Examples
Discussion
Footnotes
References

Rights & Permissions

Abstract

This paper proposes a method for assessing differential item functioning (DIF) in item response theory (IRT) models. The method does not require pre-specification of anchor items, which is its main virtue. It is developed in two main steps: first by showing how DIF can be re-formulated as a problem of outlier detection in IRT-based scaling and then tackling the latter using methods from robust statistics. The proposal is a redescending M-estimator of IRT scaling parameters that is tuned to flag items with DIF at the desired asymptotic type I error rate. Theoretical results describe the efficiency of the estimator in the absence of DIF and its robustness in the presence of DIF. Simulation studies show that the proposed method compares favorably to currently available approaches for DIF detection, and a real data example illustrates its application in a research context where pre-specification of anchor items is infeasible. The focus of the paper is the two-parameter logistic model in two independent groups, with extensions to other settings considered in the conclusion.

Keywords

item response theory differential item functioning test scaling and equating robust statistics

Type: Theory & Methods
Information: Psychometrika , Volume 89 , Issue 3 , September 2024 , pp. 796 - 821

DOI: https://doi.org/10.1007/s11336-024-09957-6 [Opens in a new window]
Copyright: Copyright © 2024 The Author(s), under exclusive licence to The Psychometric Society

The topic of this paper is differential item functioning (DIF) in item response theory (IRT) models. The motivating application is the measurement of human development in cross-cultural contexts, which often involves translation and adaptation of existing assessments, or the development of new assessments, for use in new populations. In this context, the usual assumptions made in DIF analysis are not viable. For example, it cannot be assumed that DIF is limited to only a small proportion of items on an assessment or that a subset of items without DIF (“anchors”) can be reliably identified ahead of time. Consequently, this paper seeks to develop a method for DIF analysis that can be used in the absence of these assumptions. In particular, the paper has two main goals.

The first goal is to consider how identification constraints on the distribution of the IRT latent trait, referred to colloquially as “scaling”, are related to procedures used to assess DIF. This amounts to (yet another) discussion of the circular nature of DIF (Angoff, Reference Angoff and Berk1982), with the overall argument being that IRT-based scaling and DIF are two sides of the same problem. In particular, DIF with respect to a grouping variable is formally similar to IRT-based scaling with the common items non-equivalent groups (CINEG) design (Kolen and Brennan, Reference Kolen and Brennan2014, chap. 6). Items with DIF translate into outliers in the CINEG design. The latter problem has received some attention in the scaling literature (He et al., Reference He, Cui and Osterlind2015; He and Cui, Reference He and Cui2020; Stocking and Lord, Reference Stocking and Lord1983), although the potential advantages of this approach for DIF analysis seem to have gone largely unnoticed. The second goal of this paper is to elaborate on these advantages.

Framing DIF in terms of outlier detection allows for the theory of robust statistics to be brought to bear on the problem. The general strategy taken in this paper is to approach IRT-based scaling with the CINEG design from the perspective of M-estimation of a location parameter (Huber and Ronchetti, Reference Huber and Ronchetti2009, chap. 4). In this context, the item parameter estimates play the role of data points whose location we wish to estimate. Whereas standard M-estimation theory involves asymptotics in the number of data points (i.e., items), the results developed in this paper invoke asymptotics in the number of respondents in the IRT model while treating the number of items as fixed. Taking this approach, the asymptotic distribution of a relatively wide class of M-estimators of IRT-based scaling parameters is obtained. These results are then used to construct a highly robust redescending M-estimator that can be tuned to flag items with DIF at the desired asymptotic type I error (false positive) rate.

It is shown that the finite sample breakdown point (FSBP) of the proposed estimator depends on only the choice of starting value. This is unlike more typical M-estimation problems in which one must also consider the breakdown of an ancillary estimate of the scale (variance) of data points (Huber and Ronchetti, Reference Huber and Ronchetti2009, chap. 6). As a consequence, the proposed estimator can be constructed to achieve the theoretical maximum FSBP for any translation equivariant estimator, which is 1/2 (Huber and Ronchetti, Reference Huber and Ronchetti2009, §11.2). This means that the proposed estimator remains bounded whenever fewer than 1/2 of the items on a test exhibit DIF. Theoretical guarantees about FSBP are quite weak, so these results are complemented by data simulations illustrating how the proposed procedure, as well as some currently available DIF detection methods, perform “on the way to breakdown.” A second simulation focuses on statistical power, and the simulations are followed by a real data example from cross-cultural human development in which the pre-specification of anchor items is infeasible.

The proposed methodology is referred to as the Robust DIF (R-DIF) procedure. Its main advantages are that (a) anchor items need not be identified ahead of time, and (b) theoretical and simulation-based results provide some assurances about its performance when fewer than 1/2 of the items on an assessment exhibit DIF. It can be implemented as a post-estimation procedure following separate calibrations of a focal IRT model in the populations of interest, and it does not require multi-group models or iteratively fitting models with different parameter restrictions. Standard computational procedures can be used for estimation (e.g., iteratively reweighted least squares), and have been implemented in an R (R Core Team, 2022) package accompanying this paper, robustDIF, which is briefly described.

The focus of the paper is the two-parameter logistic (2PL) model in two independent groups. The main developments address DIF in the item difficulty parameter, which simplifies presentation. Extensions to the item discrimination parameter are then shown to follow directly from the main results. The next section reviews the literature with the purpose of making connections between DIF, IRT-based scaling, and robust statistics. While many of these issues extend beyond the context of IRT, the focus of the review is IRT-based methods.

1. The Circular Nature of DIF: Redux

DIF involves two interrelated problems. The first and more obvious problem is to infer whether item parameters differ as a function of one or more external variables. One way to do this is Lord’s (Reference Lord1980) test, which is considered here for illustrative purposes. Let $p_{gi}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{gi}$$\end{document} denote the probability of a respondent in group $g = 0, 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g = 0, 1$$\end{document} endorsing a binary item $i = 1, \dots, m$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i = 1, \dots , m$$\end{document} . Specify the 2PL IRT model as:

(1)

\begin{matrix} logit (p_{gi}) = a_{gi} (η_{g} - b_{gi}) with η_{g} \sim N (μ_{g}, σ_{g}^{2}), \end{matrix}

where $a_{gi}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{gi}$$\end{document} is the item discrimination parameter, $b_{gi}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{gi}$$\end{document} is the item difficulty parameter, and $η_{g}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta _{g}$$\end{document} is the latent trait. Then Lord’s test for the item difficulty parameters can be written as

\begin{matrix} z_{i} = \frac{{\hat{b}}_{1 i} - {\hat{b}}_{0 i}}{\sqrt{var ({\hat{b}}_{1 i}) + var ({\hat{b}}_{0 i})}} . \end{matrix}

When ${\hat{b}}_{gi}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{b}}_{gi}$$\end{document} is the maximum likelihood estimate (MLE) of $b_{gi}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{gi}$$\end{document} , $z_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_i$$\end{document} is a Wald test.

The second problem has to do with the identification of IRT models. In particular, the 2PL IRT model is identified only up to an affine transformation of the latent trait (see van der Linden, Reference van der Linden2016, §2.2.3). Thus, using the transformed values $η_{g}^{*} = A η_{g} + B$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta _g^* = A \eta _g + B$$\end{document} , $b_{gi}^{*} = A b_{gi} + B$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{gi}^* = A b_{gi} + B$$\end{document} , and $a_{gi}^{*} = a_{gi} / A$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{gi}^* = a_{gi} / A$$\end{document} in Eq. (1) leaves $logit (p_{gi})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {logit}(p_{gi})$$\end{document} unchanged. A common way to address this problem is by setting $μ_{g}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _g$$\end{document} and $σ_{g}^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2_g$$\end{document} to fixed values, which is referred to as scaling the latent trait.

Each of these two problems has implications for the other. Let us first consider the implications of the identification problem for testing the item parameters. In the DIF literature, differences in the distribution of the latent trait across groups are referred to as “impact” (Angoff, Reference Angoff, Holland and Wainer1993). When the latent trait is scaled separately in each group, this requires that we ignore impact. In particular, assuming that the latent trait has the same distribution in both groups is equivalent to transforming the latent variable as

\begin{matrix} η_{1}^{*} = σ_{0} (\frac{η_{1} - μ_{1}}{σ_{1}}) + μ_{0} . \end{matrix}

The corresponding transformation of the item difficulty parameters is

(2)

\begin{matrix} b_{1 i}^{*} = σ_{0} (\frac{b_{1 i} - μ_{1}}{σ_{1}}) + μ_{0}, \end{matrix}

and plugging $b_{1 i}^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b^*_{1i}$$\end{document} into Lord’s test gives

(3)

\begin{matrix} z_{i}^{*} & = \frac{{\hat{b}}_{1 i}^{*} - {\hat{b}}_{0 i}}{\sqrt{Var ({\hat{b}}_{1 i}^{*}) + Var ({\hat{b}}_{0 i})}} = \frac{\frac{σ_{0}}{σ_{1}} ({\hat{b}}_{1 i} - μ_{1}) + μ_{0} - {\hat{b}}_{0 i}}{\sqrt{\frac{σ_{0}^{2}}{σ_{1}^{2}} Var ({\hat{b}}_{1 i}) + Var ({\hat{b}}_{0 i})}} . \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} z^*_i&= \frac{{\hat{b}}^*_{1i} - {\hat{b}}_{0i}}{\sqrt{\text {Var}({\hat{b}}^*_{1i}) + \text {Var}({\hat{b}}_{0i})}} = \frac{\frac{\sigma _0}{\sigma _1} \left( {\hat{b}}_{1i} - \mu _{1}\right) + \mu _0 - {\hat{b}}_{0i}}{\sqrt{\frac{\sigma ^2_0}{\sigma ^2_1} \text {Var}({\hat{b}}_{1i}) + \text {Var}({\hat{b}}_{0i})}}. \end{aligned}$$\end{document}

Equation (3) shows how the identification problem affects Lord’s test: When impact is ignored, Lord’s test is biased by the mean and variance of the latent trait in both groups. Stated more generally, comparing item parameters over groups requires solving the scaling problem.

In the context of two independent groups, the usual way to solve the scaling problem is to (a) arbitrarily scale the latent trait in only one of the groups, and then (b) assume that some (or all) of the item parameters are equal over groups. Part (a) is warranted because, as noted, the 2PL model is identified only up to an affine transformation of the latent trait. Part (b) then suffices to scale the latent trait in the second group, for example, by setting $b_{1 i} = b_{0 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{1i} = b_{0i}$$\end{document} for at least two items and then solving for $μ_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _1$$\end{document} and $σ_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _1$$\end{document} using Eq. (2). In the literature on IRT-based scaling, this two-part approach is referred to as the CINEG design (see Kolen and Brennan, Reference Kolen and Brennan2014, §6.3.1). In the literature on DIF, part (b) of the scaling problem is referred to as choosing an “anchor set” of items (Kopf et al., 2015a). Choosing anchor items brings us back to the problem of comparing item parameters over groups, whence the circular nature of DIF (Angoff, Reference Angoff and Berk1982).

Traditional approaches to DIF analysis sought to circumvent this issue by proceeding iteratively, first assuming an anchor set, then testing DIF on each item, then updating the anchor set, and so on. The two-stage “purification” and “refinement” approach (Dorans and Holland, Reference Dorans, Holland, Holland and Wainer1993) is perhaps the best-known example of this strategy. While this two-stage approach can work well in some settings, it does not control type I error rates when a moderate proportion of items (e.g., $\geq 1 / 4$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ge 1/4$$\end{document} ) are biased in the same direction (e.g., Kopf et al., 2015b). While many alternative strategies for selecting anchors have been proposed, these rely mainly on heuristic arguments about the size of the anchor set and criteria for selecting anchors (e.g., Kopf et al., 2015a, b). In general, approaches based on anchor item selection are unsatisfying from a theoretical perspective because subsequent tests of DIF proceed as if the anchors were known a’ priori.

Becgher and Maris (Reference Bechger and Maris2015) proposed a test of DIF that is invariant under affine transformation of the latent trait. This approach avoids the logical circularity of traditional methods, but results in pairwise comparisons over items, rather than a test of individual items, which is a practical shortcoming. Yuan et al. (Reference Yuan, Liu and Han2021) proposed a Monte-Carlo test that replaces pairwise comparisons over items with comparison to a single reference point, although the latter is taken as the average of anchor items whose selection remains largely heuristic. Other recent research has used regularization methods to simultaneously estimate item parameters and scaling parameters, while imposing sparsity on quantities that govern DIF (Belzak and Bauer, Reference Belzak and Bauer2020; Magis et al., Reference Magis, Tuerlinckx and De Boeck2015; Schauberger and Mair, Reference Schauberger and Mair2020). Although the use of regularization for variable selection in regression-based models is well established, the theoretical motivation for using regularization to address the scaling of IRT models is less clear. The simulation studies presented in this paper suggest that the performance of regularization-based approaches is not qualitatively different from traditional DIF methods.

In what follows I propose an alternative approach to DIF analysis. The overall idea is to tackle the scaling problem directly using robust statistics. Early work on IRT-based scaling considered, and dismissed, approaches based on robust statistics (Stocking and Lord, Reference Stocking and Lord1983, Appendix). He (Reference He2013), He et al. (Reference He, Cui and Osterlind2015), He and Cui (Reference He and Cui2020) revisited the topic of robust scaling, and this work is a source of inspiration for the present research. In particular, He (Reference He2013) considered outlier detection and omission in the context of IRT-based scaling, but dismissed this approach with the rationale of preserving content coverage in the anchor set. Recently, an independent line of research by Wang et al. (Reference Wang, Liu and Liu2022) has addressed the use of robust regression in DIF analysis. This present paper focuses on the related problem of IRT-based scaling via M-estimation of a location parameter, contributing a relatively general asymptotic test of DIF as well as theoretical results on the robustness of the proposed R-DIF procedure. Other recent work has applied robust methods to the choice of cut-off values for item fit indices (von Davier and Bezirhan, 2022), but has been developed under the assumption that only a small proportion of items may exhibit DIF.

Recent research has also emphasized the connection between scaling and DIF Doebler (Reference Doebler2019); Stenhaug et al. (2021); Strobl et al. (Reference Strobl, Kopf, Kohler, von Oertzen and Zeileis2021), although these approaches have not made use of robust methods to address the scaling problem. Another related approach is the alignment procedure Asparouhov and Muthén (Reference Asparouhov and Muthén2014); Robitzsch and Lüdtke (Reference Robitzsch and Lüdtke2023), in which the configural model is estimated as a first step, and then a loss function is used to minimize the degree to which item parameters vary over groups. R-DIF is also a post-estimation procedure, but the goal is not to minimize DIF-rather, the goal is to obtain a robust estimate of IRT scaling parameters and use this as a basis for testing for DIF.

Although the theory of M-estimation is well established (Huber and Ronchetti, Reference Huber and Ronchetti2009), there are some peculiar aspects of the IRT-based scaling problem that do not feature in the more general theory and therefore warrant special attention. First, the population model in IRT-based scaling is known a’ priori (e.g., Eq. (2)). Knowing the population model means that we can aggressively pursue model-based outlier detection, without worrying about whether the model is correct. Second, the population model is deterministic (e.g., unlike regression models, the linear relationship in Eq. (2) does not contain a residual term). This implies that the only source of variation in the sample-based scaling problem is the (co-)variances of the item parameters estimates, which are available, for example, via known results on maximum likelihood estimation in IRT (e.g., Bock and Gibbons, Reference Bock and Gibbons2021). Third, asymptotic results for the proposed M-estimator can be obtained via the IRT parameter estimates, and this provides an alternative route to inference than is usually considered in M-estimation theory. These details are elaborated in the following section.

2. The R-DIF Procedure

The overall logic of the R-DIF procedure is to obtain a robust estimator of IRT scaling parameters and then use it to construct a robust test of DIF. It turns out that tuning the estimator to be robust to DIF is tantamount to flagging (down-weighting) items with DIF during estimation. Additionally, both the estimator and test can be parameterized such that the only quantity directly affected by DIF is the IRT scaling parameter itself. This is importantly different from similar problems in M-estimation that require an ancillary estimate of the scale (variance) of the data, and is the key to the robustness (i.e., high breakdown point) of the R-DIF procedure. The main steps involved in developing the R-DIF procedure are summarized below, and the remainder of this section describes each step in more detail.

The R-DIF procedure is based on an M-estimator of a location parameter, $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} . The estimator, $\tilde{θ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{\theta }}$$\end{document} , can be defined in terms of the estimating equation

(4)

\begin{matrix} Ψ (θ) = \sum_{i = 1}^{m} ψ (\frac{Y_{i} - θ}{s_{i}}) = 0, \end{matrix}

which is developed in the following three steps:

1. $Y_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i$$\end{document} is defined as a scalar-valued function of the MLEs of the parameters of item $i = 1, \dots, m$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i = 1, \dots , m$$\end{document} , such that $\sqrt{n} (Y_{i} - θ) \overset{d}{\to} N (0, τ_{i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{n}\, (Y_i - \theta ) \overset{d}{\rightarrow }\ N (0, \tau _i)$$\end{document} under the null hypothesis that item i does not exhibit DIF, with n denoting the number of respondents. The $Y_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i$$\end{document} play the role of the sample data points whose location parameter we wish to estimate.
2. For a relatively general choice of $ψ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi $$\end{document} , choosing $s_{i} = τ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_i = \tau _i$$\end{document} to be the variance of the asymptotic null distribution of $Y_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i$$\end{document} is shown to lead to an efficient estimator of $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} as well as a convenient asymptotic test of DIF.
3. The loss function $ψ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi $$\end{document} is chosen to be a so-called redescending function (Huber and Ronchetti, Reference Huber and Ronchetti2009, §. 4.8) that is tuned so that values of $| Y_{i} - \tilde{θ} | / τ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|Y_i - {\tilde{\theta }}| / \tau _i$$\end{document} beyond the $1 - α / 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1-\alpha /2$$\end{document} quantile of its asymptotic null distribution are automatically set to zero during estimation.

2.1. Step 1: Setting Up the Scaling Problem

Specify the IRT models as above, although now in slope-intercept form, in the reference and comparison groups, respectively:

(5)

\begin{matrix} logit (p_{0 i}) & = a_{0 i} η_{0} + d_{0 i} with η_{0} \sim N (0, 1) \\ logit (p_{1 i}) & = a_{1 i} η_{1} + d_{1 i} with η_{1} = (η_{1}^{*} - μ) / σ and η_{1}^{*} \sim N (μ, σ^{2}) . \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \text {logit}(p_{0i})&= a_{0i} \eta _{0} + d_{0i} \; \text { with } \; \eta _0 \sim N(0, 1) \nonumber \\ \text {logit}(p_{1i})&= a_{1i} \eta _{1} + d_{1i} \; \text { with } \; \eta _1 = (\eta ^*_1 - \mu ) / \sigma \; \text { and } \; \eta ^*_1 \sim N(\mu , \sigma ^2). \end{aligned}$$\end{document}

The scaling problem requires solving for $μ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document} and $σ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma $$\end{document} using the relation (cf. Kolen and Brennan, Reference Kolen and Brennan2014, §6.2.1)

\begin{matrix} a_{1 i} η_{1} + d_{1 i} & = a_{1 i}^{*} η_{1}^{*} + d_{1 i}^{*}, \end{matrix}

from which follows the scaling equations:

(6)

\begin{matrix} σ & = a_{1 i} / a_{1 i}^{*} \end{matrix}

(7)

\begin{matrix} μ / σ & = (d_{1 i} - d_{1 i}^{*}) / a_{1 i} . \end{matrix}

In the CINEG design, we let the item parameters in the reference group stand in for the “un-scaled” item parameters in the comparison group:

(8)

\begin{matrix} a_{1 i}^{*} & = a_{0 i} \end{matrix}

(9)

\begin{matrix} d_{1 i}^{*} & = d_{0 i} . \end{matrix}

These two equalities assert that item i does not exhibit DIF with respect to group membership, which makes explicit the formal connection between IRT-based scaling using the CINEG design and DIF. I will refer to Eqs. (8) and (9) as null hypotheses about DIF on the slope and intercept of item i, respectively.

Although it is usual to isolate $μ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document} by substituting for $σ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma $$\end{document} in Eq. (7), it is preferable to treat $θ = μ / σ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta = \mu / \sigma $$\end{document} as the target parameter when addressing DIF. Taking this approach, it can be seen that the null hypothesis about item slopes in Eq. (8) applies only to the scaling equation for $σ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma $$\end{document} in Eq. (6). Similarly, the null hypothesis about item intercepts in Eq. (9) applies only to the scaling equation for $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} in Eq. (7). This situation can be contrasted with the more usual distinction between uniform and non-uniform DIF (Mellenbergh, Reference Mellenbergh1982). In particular, uniform DIF evaluates the item difficulties (rescaled intercepts) under the assumption that there is no DIF on the item slopes, whereas Eq. (7) does not require this assumption. This is convenient because it allows for DIF in each type of item parameter to be addressed separately. In what follows I focus on the item intercepts only (i.e., Eqs. (7) and (9)). The same developments apply to the item slopes with only minor modifications, and it will also be shown how to test both item parameters simultaneously; these topics are deferred to the section of this paper entitled “Extensions”.

In practice, the item parameters will be estimated in independent samples of sizes $n_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n_0$$\end{document} and $n_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n_1$$\end{document} . To set up the sample-based problem, collect the parameters of item i in the vector $ν_{i} = {[a_{0 i}, d_{0 i}, a_{1 i}, d_{1 i}]}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\nu }_i = [a_{0i}, d_{0i}, a_{1i}, d_{1i}]^T$$\end{document} , let $ν = {[ν_{1}^{T}, ν_{2}^{T}, \dots, ν_{m}^{T}]}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\nu } = [\varvec{\nu }_1^T, \varvec{\nu }_2^T, \dots , \varvec{\nu }_m^T]^T$$\end{document} , and write

(10)

\begin{matrix} Y_{i} (ν) = (d_{1 i} - d_{0 i}) / a_{1 i} . \end{matrix}

In what follows, it is assumed that MLEs $\hat{ν}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\varvec{\nu }}$$\end{document} and their asymptotic covariance matrix $cov (\hat{ν})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {cov}({\hat{\varvec{\nu }}})$$\end{document} are available (see e.g., Bock and Gibbons, Reference Bock and Gibbons2021). The shorthand notation $Y_{i} = Y_{i} (\hat{ν})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i = Y_i(\hat{\varvec{\nu }})$$\end{document} is reserved for the sample-based quantities only.

For later reference, it is noted that application of the Delta method yields (see e.g., van der Vaart, Reference van der Vaart1998):

(11)

\begin{matrix} \sqrt{n} (Y_{i} - Y_{i} (ν)) \overset{d}{\to} N (0, var (Y_{i})) \end{matrix}

where $n = n_{0} + n_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n = n_0 + n_1$$\end{document} , $n_{0} / n_{1} = c$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n_0 / n_1 = c$$\end{document} for $c \in (0, \infty)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ c \in (0, \infty ) $$\end{document} , and

(12)

\begin{matrix} var (Y_{i}) = \nabla Y_{i} {(ν)}^{T} cov (\hat{ν}) \nabla Y_{i} (ν) . \end{matrix}

Under the null hypothesis in Eq. (9), we have $Y_{i} (ν) = θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i(\varvec{\nu }) = \theta $$\end{document} (via Eq. (7)) so that $E (Y_{i}) = θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E(Y_i) = \theta $$\end{document} and the gradient elements corresponding to item i are:

(13)

\begin{matrix} \nabla Y_{i} (ν_{i}) & = a_{1 i}^{- 1} {(0, - 1, - Y_{i} (ν), 1)}^{T} \\ = a_{1 i}^{- 1} {(0, - 1, - θ, 1)}^{T} . \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \nabla Y_i(\varvec{\nu }_i)&= a_{1i}^{-1} \left[ 0, -1 , - {Y_i(\varvec{\nu })}, \; 1\right] ^T\nonumber \\&= a_{1i}^{-1} \left[ 0, -1, - \theta , \; 1 \right] ^T. \end{aligned}$$\end{document}

Note that the other gradient elements are equal to zero. Thus the “null variance” of $Y_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i$$\end{document} can be written as a function of $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} , say $τ_{i} = τ_{i} (θ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _i = \tau _i(\theta )$$\end{document} , with Eqs. (12) and (13) leading to the explicit expression

(14)

\begin{matrix} τ_{i} (θ) \equiv var (Y_{i}) = a_{1 i}^{- 2} (θ^{2} var ({\hat{a}}_{1 i}) - 2 θ cov ({\hat{a}}_{1 i}, {\hat{d}}_{1 i}) + var ({\hat{d}}_{1 i}) + var ({\hat{d}}_{0 i})) . \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \tau _i(\theta ) \equiv \text {var}(Y_i ) = a_{1i}^{-2} \left( \theta ^2 \text {var}({\hat{a}}_{1i}) - 2 \theta \,\text {cov}({\hat{a}}_{1i}, {\hat{d}}_{1i}) + \text {var}({\hat{d}}_{1i}) + \text {var}({\hat{d}}_{0i}) \right) . \end{aligned}$$\end{document}

The foregoing results provide a key idea behind this paper: The asymptotic null distribution of $Y_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i$$\end{document} can be obtained by using $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} in place of $Y_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i$$\end{document} . Indeed, when comparing the R-DIF procedure to previous work on robust scaling (e.g., Stocking and Lord, Reference Stocking and Lord1983; He, Reference He2013; Wang et al., Reference Wang, Liu and Liu2022), the substitution in the second line of Eq. (13) is perhaps the crucial difference. To anticipate Theorem 2 below, treating $τ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _i$$\end{document} as a function of $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} allows for the R-DIF procedure to achieve a FSBP of 1/2. In practice, this means that we can obtain a reasonable estimate of the null distribution of $Y_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i$$\end{document} , so long as fewer than one-half of the items exhibit DIF.

2.2. Step 2: Choosing the Weights $s_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_i$$\end{document}

One rationale for choosing the weights $s_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_i$$\end{document} in Eq. (4) is to ensure that the resulting M-estimator has acceptable efficiency in the absence of outliers (Maronna et al., Reference Maronna, Martin, Yohai and Salibián-Barrera2019, §2.3.2). In the present context, the absence of outliers corresponds to the joint null hypothesis that none of the items exhibit DIF. The first part of Theorem 1 below shows that, for a relatively general choice of $ψ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi $$\end{document} , setting $s_{i} = τ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_i = \tau _i$$\end{document} results in an unbiased and asymptotically (in n) efficient estimator of $\tilde{θ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{\theta }}$$\end{document} , under the joint null hypothesis that none of the item intercepts exhibit DIF. The second part of the theorem obtains the distribution of $Y_{i} - \tilde{θ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i - {\tilde{\theta }}$$\end{document} under these same conditions. Subsequent remarks address the utility of these results in the context of DIF analysis.

The following notation is required. Let the function $θ = θ (ν)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta = \theta (\varvec{\nu })$$\end{document} be implicitly defined by

(15)

\begin{matrix} Ψ (ν, θ) = \sum_{i = 1}^{m} ψ (U_{i}, (ν)) = 0 \end{matrix}

with $U_{i} (ν) = (Y_{i} (ν) - θ) / s_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$U_i(\varvec{\nu }) = (Y_i(\varvec{\nu }) - \theta )/s_i$$\end{document} and $s_{i} > 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_i > 0$$\end{document} . The M-estimator computed using the IRT MLEs $\hat{ν}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\varvec{\nu }}$$\end{document} will be wrtitten $\tilde{θ} = θ (\hat{ν})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{\theta }} = \theta (\hat{\varvec{\nu }})$$\end{document} . The following assumptions about $ψ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi $$\end{document} are also required:

A1 $ψ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi $$\end{document} is differentiable with $ψ^{'} = d ψ (u) / d u$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi ' = d\psi (u) / du$$\end{document} .
A2 For some positive constant k and $u \in (- k, k)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u \in (-k, k)$$\end{document} , $ψ (u) = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi (u) = 0$$\end{document} if and only if $u = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u = 0$$\end{document} .
A3 $ψ^{'} (0) = c > 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi '(0) = c > 0$$\end{document} .
A4 $Ψ^{'} = \partial Ψ / \partial θ \neq 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \Psi ' = \partial \Psi / \partial \theta \ne 0$$\end{document} at $θ_{0} = μ / σ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _0 = \mu /\sigma $$\end{document} .

Assumptions (A1) through (A3) are not restrictive for many common choices of $ψ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi $$\end{document} (e.g., see Maronna et al., Reference Maronna, Martin, Yohai and Salibián-Barrera2019, chap. 2), although they do exclude some more robust choices, notably the median (by A1). Assumption (A4) is required to obtain the general (i.e., non-null) asymptotic distribution of $\tilde{θ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{\theta }}$$\end{document} and of $Y_{i} - \tilde{θ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i - {\tilde{\theta }}$$\end{document} for non-monotone $ψ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi $$\end{document} , but is not restrictive for the null distribution (see Eq. (34) in the Appendix). M-estimators of location are typically characterized by the following additional assumption, which is noted here for later reference but is not required by the theorem:

A5 $ψ (u)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi (u)$$\end{document} is odd.

Theorem 1

For the two-group IRT model in Eq. (5), let $θ_{0} = μ / σ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _0 = \mu /\sigma $$\end{document} denote the target scaling parameter, with item parameters collected in the vector $ν$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\nu }$$\end{document} , and MLEs $\sqrt{n} (\hat{ν} - ν) \overset{d}{\to} N (0, cov (\hat{ν}))$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{n}\, (\hat{\varvec{\nu }} - \varvec{\nu }) \overset{d}{\rightarrow }\ N(\varvec{0}, \text {cov}(\hat{\varvec{\nu }}))$$\end{document} obtained in a sample of size $n = n_{0} + n_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n = n_0 + n_1$$\end{document} , with $n_{0} / n_{1} = c$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n_0/n_1 = c$$\end{document} for $c \in (0, \infty)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c \in (0, \infty )$$\end{document} . Let $θ (ν)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta (\varvec{\nu })$$\end{document} and $\tilde{θ} = θ (\hat{ν})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{\theta }} = \theta (\hat{\varvec{\nu }})$$\end{document} be defined as in Eq. (15) and Assumptions A1–A4. Assume the joint null hypothesis that $Y_{i} (ν) = θ_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i(\varvec{\nu }) = \theta _0$$\end{document} for $i = 1, \dots, m$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i = 1, \dots , m$$\end{document} (i.e., no DIF). Then choosing $s_{i} = var (Y_{i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_i = \text {var}(Y_i)$$\end{document} implies

Part (a): $\sqrt{n} (\tilde{θ} - θ_{0}) \overset{d}{\to} N (0, var (\tilde{θ}))$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{n}\, ({\tilde{\theta }} - \theta _0) \overset{d}{\rightarrow }\ N(0, \text {var}({\tilde{\theta }}))$$\end{document} where

\begin{matrix} var (\tilde{θ}) = \frac{1}{\sum_{i} var {(Y_{i})}^{- 1}} . \end{matrix}

is a lower bound on the variance of $\tilde{θ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{\theta }}$$\end{document} .

Part (b): $\sqrt{n} T_{i} \overset{d}{\to} N (0, 1)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{n}\, T_i \overset{d}{\rightarrow } N(0, 1)$$\end{document} with

\begin{matrix} T_{i} = \frac{Y_{i} - \tilde{θ}}{\sqrt{var (Y_{i}) - var (\tilde{θ})}} . \end{matrix}

The proof is given in the Appendix. Part (a) describes the asymptotic distribution of the estimated scaling parameter $\tilde{θ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{\theta }$$\end{document} , under the joint null hypothesis that no items exhibit DIF. In particular, it shows that setting $s_{i} = var (Y_{i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_i = \text {var}(Y_i)$$\end{document} ensures that the resulting estimator $\tilde{θ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{\theta }}$$\end{document} is asymptotically efficient in the absence of DIF. This may seem counterintuitive in light of well-known results about the relative inefficiency of robust estimators (e.g., Huber and Ronchetti, Reference Huber and Ronchetti2009, chap. 6). However, past results use asymptotics in m, which do not feature in the current approach. In the present case, the intuition is as follows. As $n \to \infty$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n \rightarrow \infty $$\end{document} , the null distribution of $Y_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i$$\end{document} becomes increasingly concentrated around $θ_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _0$$\end{document} for each $i = 1, \dots, m$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i = 1, \dots , m$$\end{document} . Thus, in the limit, the loss function $ψ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi $$\end{document} influences the null distribution of $Y_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i$$\end{document} only via the constant $ψ^{'} (0)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi '(0)$$\end{document} (see A3), and this constant cancels out when computing the variance of $\tilde{θ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{\theta }$$\end{document} (see Eq. (32)).

Part (b) of the theorem provides a Wald test of DIF for a relatively wide class of M-estimators of $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} . The result shows that a robust test will require robust estimates of both $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} and $var (Y_{i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {var}(Y_i)$$\end{document} , $i = 1, \dots, m$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i = 1, \dots , m$$\end{document} . However, as noted in connection with Eq. (14), these are one and the same problem. In particular, under the null hypothesis of no DIF on item i, $var (Y_{i}) = τ_{i} (θ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {var}(Y_i) = \tau _i(\theta )$$\end{document} depends only on $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} and the covariance matrix of the item’s parameter estimates. Thus, a robust test of DIF requires only a robust estimate of $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} . The following section addresses how to obtain such an estimate.

A final remark concerns the non-null distributions of $\tilde{θ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{\theta }}$$\end{document} and $Y_{i} - \tilde{θ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i - {\tilde{\theta }}$$\end{document} . The Appendix shows that both the mean and variance of these distributions depend directly on $ψ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi $$\end{document} . The study of these distributions presents interesting possibilities for future research, but will not be addressed in this paper. The simulation studies presented below provide some empirical examples of the statistical power of the R-DIF procedure.

2.3. Step 3: Choosing the Loss Function $ψ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi $$\end{document}

This section introduces additional assumptions about $ψ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi $$\end{document} in order to obtain a robust estimator of $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} . A general strategy is to choose $ψ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi $$\end{document} so that the influence of any individual data point is bounded (Huber and Ronchetti, Reference Huber and Ronchetti2009, §1.5):

A6 $∣ ψ (u) ∣ < c$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mid \psi (u) \mid < c$$\end{document} for some positive constant c.

This strategy is taken one step further by so-called redescending M-estimators, which, in addition to being bounded, assign outliers a weight of zero (Huber and Ronchetti, Reference Huber and Ronchetti2009, §4.8):

A7 $ψ (u) = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi (u) = 0$$\end{document} for $| u | > k$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|u| > k$$\end{document} .

The constant k is a tuning parameter that ensures the estimator is resistant to outliers $| u | > k$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|u| > k$$\end{document} , and, as a bi-product, also automatically “flags” any such outliers during estimation.

The usual application of redescending M-estimators is to guard against gross outliers, while also ensuring acceptable efficiency in the absence of outliers. This leads to choices of k that are intended to flag only a small proportion of data points (e.g., Maronna et al. Reference Maronna, Martin, Yohai and Salibián-Barrera2019; von Davier and Bezirhan, 2022). In the research context described at the outset of this paper, the goal can be better characterized in terms of guarding against potentially many modest outliers, and part (a) of Theorem 1 shows that the choice of $ψ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi $$\end{document} does not affect asymptotic (in n) efficiency in the absence of outliers. Thus it is proposed to pursue a more aggressive choice of k.

In particular, part (b) of Theorem 1 shows how to choose item-specific tuning parameters $k_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_i$$\end{document} such that items with DIF are flagged at a chosen asymptotic type I error rate, $α$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} . Letting $u = U_{i} = (Y_{i} - \tilde{θ}) / τ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u = U_i = (Y_i - {\tilde{\theta }}) / \tau _i$$\end{document} , the theorem implies that $\sqrt{n} U_{i} \overset{d}{\to} N (0, ω_{i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{n} \, U_i \overset{d}{\rightarrow }\ N(0, \omega _i)$$\end{document} , where

(16)

\begin{matrix} ω_{i} = \frac{τ_{i} - \bar{τ} / m}{τ_{i}^{2}} \end{matrix}

with $τ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _i$$\end{document} defined in Eq. (14) and $\bar{τ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{\tau }$$\end{document} denoting the harmonic mean over $i = 1, \dots, m$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i = 1, \dots , m$$\end{document} . By definition, choosing $k_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_i$$\end{document} to be the $1 - α / 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1-\alpha /2$$\end{document} quantile of $N (0, ω_{i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N(0, \omega _i)$$\end{document} implies that $Prob (| U_{i} | > k_{i}) = α$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {Prob}(|U_i| > k_i) = \alpha $$\end{document} under the joint null hypothesis that no items exhibit DIF. Using A7, the function $ψ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi $$\end{document} is set to zero for values of $| U_{i} | > k_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|U_i| > k_i$$\end{document} . Thus, the proposed choice of $k_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_i$$\end{document} is seen to be equivalent to an asymptotic test of DIF with size $α$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} – i.e., items that exhibit DIF are flagged by setting $ψ (u_{i}) = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi (u_i) = 0$$\end{document} during estimation of $\tilde{θ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{\theta }$$\end{document} .

Motivating a per-item tuning parameter in terms of the desired type I error rate for DIF detection is a primary advantage of the proposed approach compared to that of Wang et al. (Reference Wang, Liu and Liu2022). In particular, the simulation studies reported below show that the R-DIF procedure yields a test of DIF that maintains the nominal value of $α$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} quite well even when a relatively large proportion of items exhibit DIF.

While there are various redescending M-estimators available, Tukey’s bisquare is well suited to the present context. It can be defined as:

(17)

\begin{matrix} ψ (u) = (\begin{matrix} u {(1 - {(\frac{u}{k})}^{2})}^{2} & for & ∣ u ∣ \leq k \\ 0 & for & ∣ u ∣ > k . \end{matrix}) \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \psi (u) = \left\{ \begin{array}{ccc} u \left( 1 - \left( \frac{u}{k} \right) ^2\right) ^2 &{} \text { for } &{} \mid u \mid \le k \\ 0 &{} \text { for } &{}\mid u \mid > k. \\ \end{array} \right. \end{aligned}$$\end{document}

Note that, in general, choosing a per-item turning parameter $k_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_i$$\end{document} implies that each item has a different loss function, $ψ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi _i$$\end{document} . This situation was not directly addressed by Theorem 1. However, Assumptions A1–A4 continue to hold when using the bi-square function with per-item tuning parameters. In particular, $ψ^{'} (0) = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi '(0) = 1$$\end{document} is a constant that does not depend on the choice k, so that A3 is satisfied. The bisquare function in Eq. (17) is used in the simulation studies and empirical example reported below.

2.4. Summary

In the case of item intercepts, the R-DIF procedure is defined by Eq. (4) with $Y_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i$$\end{document} given in Eq. (10) and $s_{i} = τ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_i = \tau _i$$\end{document} given in Eq. (14). The loss function $ψ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi $$\end{document} is defined through assumptions A1–A7, with per-item tuning parameter $k_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_i$$\end{document} chosen to be the $1 - α / 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1-\alpha /2$$\end{document} quantile of $N (0, ω_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N(0, \omega _i$$\end{document} ), where $α$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} is the desired (asymptotic) type I error rate for DIF detection and $ω_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\omega _i$$\end{document} is given in Eq. (16). The resulting estimate of $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} will be denoted ${\tilde{θ}}_{RD}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{\theta }}_{RD}$$\end{document} , and setting $\tilde{θ} = {\tilde{θ}}_{RD}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{\theta }} = {\tilde{\theta }}_{RD}$$\end{document} in part (b) of Theorem 1 will be referred to as the R-DIF test. For computational purposes, Tukey’s bi-square in Eq. (17) will be used.

3. Robustness

The purpose of this section is to characterize the robustness of the R-DIF procedure in terms of its breakdown point. Theorem 2 shows that the finite sample breakdown point (FSBP) of ${\tilde{θ}}_{RD}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{\theta }_{RD}$$\end{document} is determined by the choice of a preliminary estimate of $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} , say $θ^{(0)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta ^{(0)}$$\end{document} . In practice, this translates into choosing the starting value for iterative estimation procedures such as Newton–Raphson or iteratively re-weighted least squares (IRLS). In particular, choosing $θ^{(0)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta ^{(0)}$$\end{document} to be the median of the $Y_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i$$\end{document} ensures that ${\tilde{θ}}_{RD}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{\theta }}_{RD}$$\end{document} has the maximum attainable FSBP of any non-trivial location estimator, which is 1/2 (Huber, Reference Huber1984). The section also discusses the implications of breakdown for DIF analysis more generally.

It is important to emphasize that analytical results on breakdown are quite weak – they merely describe the minimum proportion of corrupted data points (i.e., items with DIF) that can lead an estimator to take on “arbitrarily large aberrant values” (Huber and Ronchetti, Reference Huber and Ronchetti2009, p.279). In practice, the usefulness of any statistic will come into question before it becomes unbounded. Thus, while the concept of breakdown provides a widely used analytical tool for describing robustness, it is helpful for theoretical results to be complemented by numerical examples that characterize how a procedure performs on the way to breakdown. The first simulation study reported in this paper plays this role.

3.1. FSBP of the R-DIF Procedure

The definition of FSBP is briefly reviewed before presenting Theorem 2. Let $Y = (Y_{1}, \dots, Y_{m})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y} = (Y_1, \dots , Y_m)$$\end{document} denote a sample of size m and $S (Y) \in R$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S(\varvec{Y}) \in \mathbb {R}$$\end{document} denote a statistic of interest. In this present context the index i is over items (not respondents), which is why consideration of the FSBP, rather than its asymptotic analogs, is especially relevant. Consider the situation where $Y$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y}$$\end{document} is corrupted by replacing $n \leq m$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n \le m$$\end{document} observations with arbitrary values. The corrupted data can be written as $Y_{i}^{'} = Y_{i} + Δ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y'_i = Y_i + \Delta _i$$\end{document} with $Δ_{i} \in R$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta _i \in \mathbb {R}$$\end{document} , and $Δ_{i} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta _i = 0$$\end{document} for $m - n$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m - n$$\end{document} values of i. Then $ϵ = n / m$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon = n / m$$\end{document} is the fraction of corrupted values in $Y^{'} = (Y_{1}^{'}, \dots, Y_{m}^{'})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y}' = (Y'_1, \dots , Y'_m)$$\end{document} .

The maximal finite sample “bias” in S that can be caused by replacing $Y$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y} $$\end{document} with an $ϵ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon $$\end{document} -corrupted dataset $Y^{'}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y}'$$\end{document} is defined as (see Huber and Ronchetti, Reference Huber and Ronchetti2009, chap. 11):

(18)

\begin{matrix} b (ϵ, S, Y) = sup_{Y^{'}} {∣ S (Y) - S (Y^{'}) ∣} \end{matrix}

and the FSBP of S is

(19)

\begin{matrix} ϵ^{*} = inf {ϵ ∣ b (ϵ, S, Y) = \infty} . \end{matrix}

This translates roughly to the smallest proportion of outliers that can cause a statistic of interest to take on arbitrarily large aberrant values.

In order to derive the FSBP of ${\tilde{θ}}_{RD}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{\theta }}_{RD}$$\end{document} , it is helpful to consider its one-step formulation. The intuition behind one-step M-estimation is to start with an initial estimator $θ^{(0)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta ^{(0)}$$\end{document} and update it by applying Newton’s rule to Eq. (4) just once:

(20)

\begin{matrix} θ^{(1)} = θ^{(0)} - \frac{Ψ (θ^{(0)})}{Ψ^{'} (θ^{(0)})} . \end{matrix}

The one-step estimator appears in the asymptotic theory of M-estimation (see van der Vaart, Reference van der Vaart1998 §5.7). In the present context, its utility is to prove the following theorem.

Theorem 2

Let $ϵ_{r}^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon ^*_r$$\end{document} denote the FSBP of $θ^{(r)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta ^{(r)}$$\end{document} , $r = 0, 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r = 0, 1$$\end{document} , as defined by Eqs. (4) and (20). Let $ψ (u)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi (u)$$\end{document} be defined by assumptions A1–A7, with $u = U_{i} = (Y_{i} - θ) / τ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u = U_i = (Y_i - \theta ) / \tau _i$$\end{document} , $Y_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i$$\end{document} given in Eq. (10), $τ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _i$$\end{document} given in Eq. (14), and tuning parameters $k_{i} > 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_i > 0$$\end{document} . Finally, assume that $θ^{(0)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta ^{(0)}$$\end{document} is not a stationary point of $Ψ (θ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Psi (\theta )$$\end{document} . Then $ϵ_{1}^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon ^*_1$$\end{document} = $ϵ_{0}^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon ^*_0$$\end{document} .

The proof is given in the Appendix. As mentioned, it depends mainly on treating $τ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _i$$\end{document} as a known function of $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} (see Eq. (14)). It is seen to trivially extend to further iterations, leading to the corollary that $θ^{(r + 1)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta ^{(r+1)}$$\end{document} has the same FSBP as $θ^{(0)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta ^{(0)}$$\end{document} for finite values of $r = 0, 1, 2, \dots$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r = 0, 1, 2, \dots $$\end{document} . The assumption that the initial value of $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} is not a stationary point of $Ψ (θ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Psi (\theta )$$\end{document} is restrictive for redescending $ψ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi $$\end{document} . However, in practice, there are alternative estimation procedures that do not require this assumption (e.g., IRLS).

Theorem 2 is not directly addressed by past research, although other authors have mentioned the importance of using robust starting values for redescending M-estimators (e.g., Maronna et al., Reference Maronna, Martin, Yohai and Salibián-Barrera2019, §2.8.1). Huber (Reference Huber1984) considered redescending M-estimators of location in which the median absolute deviation of the $Y_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i$$\end{document} is used in place of $τ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _i$$\end{document} , showing that $ϵ^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon ^*$$\end{document} depends not only on the choice of k but also $Y$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Y}$$\end{document} . In particular, he recommended using $k = 6$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k = 6$$\end{document} in order to ensure $ϵ^{*} \approx 1 / 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon ^* \approx 1/2$$\end{document} . Li and Zhang (Reference Li and Zhang1998) showed that the large sample breakdown of redescending M-estimators can be substantially lower than 1/2. For example, using their results with the value of $k = 1.96$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k =1.96$$\end{document} (i.e., the .975 percentile of the standard normal), the breakdown point of the bisquare is expected to be less than 1/3. These considerations suggest that tuning redescending M-estimators to aggressively flag outliers will, paradoxically, come at the cost of robustness. The intuition here is that redescending functions with small values of k can omit substantial portions of the data and therefore lead to local “bad” solutions. Yohai (Reference Yohai1987) showed that the FSBP of redescending M-estimators can be as high as 1/2 when $τ = τ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau = \tau _i$$\end{document} is instead chosen as the solution to a preliminary M-estimation problem. Theorem 2 is in a similar vein, although in this case the result depends on treating $τ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _i$$\end{document} as a known function of $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} (i.e., Eq. (14)), which is a peculiar aspect of the IRT scaling problem.

3.2. Breakdown, IRT-Based Scaling, and DIF

Before moving on, let us briefly consider some more general implications of the concept of breakdown in IRT-based scaling and DIF analysis. In particular, the concept of “worst-case" DIF is introduced and used to motivate an informal argument against the existence of a (non-trivial) DIF detection procedure with FSBP $> 1 / 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ > 1/2$$\end{document} . This overall rationale is also used to design the first simulation study reported below.

Let $\bar{Y} = \sum Y_{i} / m$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\bar{Y}} = \sum Y_i / m$$\end{document} (i.e., $ψ (u) = u$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi (u) = u$$\end{document} and $s_{i} = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_i = 1$$\end{document} ) be the unweighted average of the scaling functions $Y_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i$$\end{document} . As defined in Eq. (18), the finite sample bias is of $\bar{Y}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\bar{Y}}$$\end{document} is $| \bar{Y} - {\bar{Y}}^{'} | = \sum_{i = 1}^{m} Δ_{i} / m$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$| {\bar{Y}} - {\bar{Y}}' | = \sum _{i=1}^m \Delta _i / m $$\end{document} . Thus, for any fixed values of $ϵ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon $$\end{document} and $Δ^{*} = max {Δ_{i}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta ^* = \max \{ \Delta _i\}$$\end{document} , the maximum bias results when all $Δ_{i} > 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta _i > 0$$\end{document} are set equal to $Δ^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta ^*$$\end{document} . Otherwise stated, the worst-case bias in the estimated scaling parameter will result when all items with DIF are biased in the same direction by the same (maximal) amount. The overall logic of this argument can be extended to other functions used for IRT-based scaling, which all involve unweighted sums of over items (Kolen and Brennan, Reference Kolen and Brennan2014, §6.3). The term “worst-case” DIF will be used to mean that DIF is not only in the same direction but also by the same magnitude. Worst-case DIF is a special case of unbalanced DIF, which occurs when all items with DIF are biased in the same direction, but not necessarily by the same magnitude (e.g., Sireci and Rios, Reference Sireci and Rios2013).

Consideration of worst-case DIF leads to the following informal argument against the existence of a (non-trivial) DIF detection procedure with FSBP $> 1 / 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$> 1/2$$\end{document} . If exactly $ϵ = 1 / 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon = 1/2$$\end{document} of the items on a test exhibit worst-case DIF, then there are two equivalent ways to identify the IRT model in Eq. (5). To see this, let $I = {i ∣ Δ_{i} = 0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {I}} = \{i \mid \Delta _i = 0\}$$\end{document} denote the items without DIF and let $J = {j ∣ Δ_{j} = Δ^{*}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {J}} = \{j \mid \Delta _j = \Delta ^*\}$$\end{document} denote the items with DIF. For notational convenience, let us also assume that the item slopes are equal to one for all items in both groups. Then the “correct” parameterization of the IRT model in Eq. (5) is obtained by setting $d_{1 i} = d_{0 i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{1i} = d_{0i}$$\end{document} for $i \in I$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i \in {\mathcal {I}}$$\end{document} and $d_{1 j} = d_{0 j} + Δ^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{1j} = d_{0j} + \Delta ^*$$\end{document} for $j \in J$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j \in {\mathcal {J}}$$\end{document} . However, transforming the latent trait as $η^{*} = η - Δ^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta ^* = \eta - \Delta ^*$$\end{document} and the item parameters as $d_{1 i}^{*} = d_{1 i} - Δ^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d^*_{1i} = d_{1i} - \Delta ^*$$\end{document} results in the same IRT model equations, but now $d_{1 i}^{*} = d_{0 i} - Δ^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d^*_{1i} = d_{0i} - \Delta ^*$$\end{document} for $i \in I$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i \in {\mathcal {I}}$$\end{document} and $d_{1 j}^{*} = d_{0 j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d^*_{1j} = d_{0j}$$\end{document} for $j \in J$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j \in {\mathcal {J}}$$\end{document} – i.e., the items with and without DIF have “flipped.”

This argument is valid for $0 < ϵ < 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0< \epsilon < 1$$\end{document} . However, when $ϵ = 1 / 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon = 1/2$$\end{document} , the problem is especially vexing because we cannot use the number of items with DIF to judge the better parameterization. In the absence of an external criterion that can be used to judge which items have DIF, this informal argument suggests FSBP $= 1 / 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$= 1/2$$\end{document} is the best we can hope for in DIF analysis.

4. Estimation

This section addresses computational aspects of R-DIF. These procedures are implemented in the robustDIF package https://github.com/peterhalpin/robustDIF, written in the R language (R Core Team, 2022). The package defaults discussed in this section were established through unreported simulation studies, although the defaults can be overridden by the user.

Estimation of ${\tilde{θ}}_{RD}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{\theta }}_{RD}$$\end{document} can proceed using known results. In particular, Newton–Raphson and IRLS can be easily implemented for M-estimators of location (Huber and Ronchetti, Reference Huber and Ronchetti2009, §6.7). Location problems typically proceed by treating $τ_{i} = τ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _i = \tau $$\end{document} as fixed to some initial value (e.g., the median absolute deviation) that is not updated during estimation. For R-DIF, we can instead compute $τ_{i}^{(0)} = τ_{i} (θ^{(0)})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _i^{(0)} = \tau _i(\theta ^{(0)})$$\end{document} for some initial estimate $θ^{(0)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta ^{(0)}$$\end{document} . Alternatively, after each iteration $r = 0, 1, \dots$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r = 0, 1, \dots $$\end{document} , the updated values $τ_{i}^{(r)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _i^{(r)}$$\end{document} can be used while solving for $θ^{(r + 1)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta ^{(r+1)}$$\end{document} . In robustDIF, the default estimator is IRLS with $τ_{i}^{(r)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _i^{(r)}$$\end{document} updated during estimation.

As indicated by Theorem 2, the choice of starting value is important for ensuring the robustness of the R-DIF procedure. The median of the $Y_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i$$\end{document} , denoted $med (Y)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {med}(Y)$$\end{document} , is a good choice (Huber, Reference Huber1964). However, there are other good choices as well. The least trimmed squares estimator with 50% trimming rate, denoted ${LTS}_{. 5} (Y)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {LTS}_{.5}(Y)$$\end{document} , also has FSBP of 1/2 and is straightforward to compute for location problems (Rousseeuw and Leroy, Reference Rousseeuw and Leroy1987). Additionally, for any choice of $ψ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi $$\end{document} such that $ρ = \int ψ (u) d u$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho = \int \psi (u)du$$\end{document} exists, one may consider the related problem of minimizing $R (θ) = \sum_{i} ρ (U_{i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R(\theta ) = \sum _i \rho (U_i)$$\end{document} with $U_{i} = (Y_{i} - θ) / τ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$U_i = (Y_i - \theta ) / \tau _i$$\end{document} . In practice, taking the minimum over the grid $θ \in Θ_{r} = {min (Y), min (Y) + r, \dots, max (Y) - r, max (Y)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta \in \Theta _r = \{\min (Y), \min (Y) + r, \dots , \max (Y) - r, \max (Y) \}$$\end{document} appears to works quite well for $r \leq . 05$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r \le .05$$\end{document} . In robustDIF, the default starting value is the median of these three choices:

\begin{matrix} θ^{(0)} = med {med (Y), {LTS}_{. 5} (Y), \underset{θ \in Θ_{. 05}}{arg min} {R (θ)}} \end{matrix}

The user may alternatively choose any one of these, or input their own numerical starting value.

As previously described, the R-DIF test can be implemented during the estimation of ${\tilde{θ}}_{RD}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{\theta }}_{RD}$$\end{document} by an appropriate choice of item-specific tuning parameters $k_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_i$$\end{document} . One could alternatively follow up the estimation procedure with a “stand-alone” test based on part (b) of Theorem 1. These two approaches will be numerically equivalent when $| θ_{i}^{(r)} - θ_{i}^{(r + 1)} | < δ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|\theta _i^{(r)} - \theta _i^{(r+1)}| < \delta $$\end{document} is sufficiently small. The choice of $δ = 10^{- 5}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta = 10^{-5}$$\end{document} is used as the convergence criterion in robustDIF. The R-DIF test can be implemented by flagging items with DIF during estimation or by computing the Wald test in a follow-up step.

A final note concerns local solutions, which can arise with redescending M-estimators due to the non-monotonicity of $ψ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi $$\end{document} . The problem can be diagnosed by plotting the function $R (θ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R(\theta )$$\end{document} against $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} . When there is a clear global minimum, convergence to that minimum can be ensured by the choice of an appropriate starting value, as addressed above. However, it is less clear how to proceed when the are multiple local minima with roughly the same value of $R (θ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R(\theta )$$\end{document} . One option is to “down-tune“ the R-DIF estimator by choosing $k_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_i$$\end{document} based on a lower-than-desired type I error rate, which has the effect of smoothing out local solutions (Huber, Reference Huber1984). This can be done when choosing starting values based on $R (θ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R(\theta )$$\end{document} and also during computation of ${\tilde{θ}}_{RD}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{\theta }_{RD}$$\end{document} . In the latter case, one may follow up estimation with a stand-alone R-DIF test conducted at the desired type I error rate. Another possibility is to report the multiple solutions and weigh their substantive interpretations.

5. Extensions

Up to this point, the focus has been on the item intercepts. Extension to the item slopes can be made by using Eqs. (6) and (8) to write

(21)

\begin{matrix} Z_{i} (ν) = a_{1 i} / a_{0 i}, \end{matrix}

with $Z_{i} (ν) = σ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_i(\varvec{\nu }) = \sigma $$\end{document} representing the null hypothesis of no DIF on the slope of item i. Under this null hypothesis, Eq. (13) is replaced by

(22)

\begin{matrix} \nabla Z_{i} (ν_{i}) & = a_{0 i}^{- 1} {(- Z_{i} (ν), 0, 1, 0)}^{T} \\ = a_{0 i}^{- 1} {(- σ, 0, 1, 0)}^{T} . \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \nabla Z_i(\varvec{\nu }_i)&= a_{0i}^{-1} \left[ -Z_i(\varvec{\nu }), 0 , 1, 0 \right] ^T\nonumber \\ {}&= a_{0i}^{-1} \left[ -\sigma , 0 , 1, 0 \right] ^T. \end{aligned}$$\end{document}

which leads to the following expression for the variance of the asymptotic null distribution of $Z_{i} = Z_{i} (\hat{ν})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_i = Z_i(\hat{\varvec{\nu }})$$\end{document} :

(23)

\begin{matrix} var (Z_{i}) = a_{0 i}^{- 2} (σ^{2} var ({\hat{a}}_{0 i}) + var ({\hat{a}}_{1 i})) . \end{matrix}

Note that, similarly to the case of item intercepts, the null hypothesis allows us to write the variance of the null distribution in terms of the target parameter $σ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma $$\end{document} . From here, setting $Y_{i} (ν) = Z_{i} (ν)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i(\varvec{\nu }) = Z_i(\varvec{\nu })$$\end{document} , $θ = σ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta = \sigma $$\end{document} , and $τ_{i} (θ) = var (Z_{i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _i(\theta ) = \text {var}(Z_i )$$\end{document} in the forgoing sections shows that the same developments carry through to the case of item slopes.

If the distribution of $Z_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_i$$\end{document} is strongly skewed this can lead to problems estimating its location (Huber and Ronchetti, Reference Huber and Ronchetti2009, §5.1). In such cases, it can be preferable to instead work with $log Z_{i} (ν)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \log Z_i(\varvec{\nu })$$\end{document} . Then the target scaling parameter becomes $θ = log σ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta = \log \sigma $$\end{document} and, under the null hypothesis, $var (log Z_{i}) = var (Z_{i}) / σ^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {var}(\log Z_i) = \text {var}(Z_i ) / \sigma ^{2}$$\end{document} . In general, it is recommended to examine the empirical distribution of $Y_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i$$\end{document} and $Z_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_i$$\end{document} to determine whether it may be more suitable to work with their log (or another) transformation.

In addition to using part (b) of Theorem 1 to test the slopes and intercepts of item i separately, one may test them simultaneously using the quadratic form

(24)

\begin{matrix} Q_{i} = (\begin{matrix} Y_{i} - {\tilde{θ}}_{RD} & Z_{i} - {\tilde{σ}}_{RD} \end{matrix}) Σ_{i}^{- 1} (\begin{matrix} Y_{i} - {\tilde{θ}}_{RD} \\ Z_{i} - {\tilde{σ}}_{RD} \end{matrix}), \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} Q_i = \left[ \begin{array}{cc} Y_i - {\tilde{\theta }}_{RD}&\quad Z_i - {\tilde{\sigma }}_{RD} \end{array} \right] \; \Sigma _{i}^{-1} \; \left[ \begin{array}{c} Y_i - {\tilde{\theta }}_{RD} \\ Z_i - {\tilde{\sigma }}_{RD} \end{array} \right] , \end{aligned}$$\end{document}

where the covariance matrix $Σ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Sigma _i$$\end{document} is given by

\begin{matrix} Σ_{i} = (\begin{matrix} var (Y_{i} - {\tilde{θ}}_{RD}) & cov (Y_{i} - {\tilde{θ}}_{RD}, Z_{i} - {\tilde{σ}}_{RD}) \\ cov (Y_{i} - {\tilde{θ}}_{RD}, Z_{i} - {\tilde{σ}}_{RD}) & var (Z_{i} - {\tilde{σ}}_{RD}) \end{matrix}) . \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \Sigma _i = \left[ \begin{array}{cc} \text {var}(Y_i - \tilde{\theta }_{RD}) &{} \quad \text {cov}(Y_i - {\tilde{\theta }}_{RD}, Z_i - {\tilde{\sigma }}_{RD}) \\ \text {cov}(Y_i - {\tilde{\theta }}_{RD}, Z_i - {\tilde{\sigma }}_{RD}) &{} \quad \text {var}(Z_i - {\tilde{\sigma }}_{RD}) \end{array} \right] . \end{aligned}$$\end{document}

Under the joint null hypothesis that none of the item slopes or intercepts exhibit DIF, the variances are obtained from part (b) of Theorem 1. Following the same steps taken in the Appendix, the “null covariance” is shown to be

(25)

\begin{matrix} cov (Y_{i} - {\tilde{θ}}_{RD}, Z_{i} - {\tilde{σ}}_{RD}) = \sum_{j = 1}^{m} \tilde{w} (Y_{j}) \tilde{w} (Z_{j}) cov (Y_{j}, Z_{j}) \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \text {cov}(Y_i - {\tilde{\theta }}_{RD}, Z_i - {\tilde{\sigma }}_{RD}) = \sum _{j = 1}^{m} {\tilde{w}}(Y_j) {\tilde{w}}(Z_j)\, \text {cov}(Y_j, Z_j) \end{aligned}$$\end{document}

with

(26)

\begin{matrix} cov (Y_{j}, Z_{j}) & = \frac{1}{a_{0 j} a_{1 j}} (σ cov ({\hat{a}}_{0 i}, {\hat{b}}_{0 i}) + cov ({\hat{a}}_{1 i}, {\hat{b}}_{1 i}) - θ var ({\hat{a}}_{1 i})), \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \text {cov}(Y_j, Z_j)&= \frac{1}{a_{0j}a_{1j}} \left( \sigma \, \text {cov}(\hat{a}_{0i}, \hat{b}_{0i}) + \text {cov}(\hat{a}_{1i}, \hat{b}_{1i}) -\theta \, \text {var}(\hat{a}_{1i}) \right) , \end{aligned}$$\end{document}

(27)

\begin{matrix} \tilde{w} (U_{j}) & = (\begin{matrix} 1 - w (U_{j}) & for & i = j \\ w (U_{j}) & for & i \neq j \end{matrix}), \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\tilde{w}}(U_j)&= \left\{ \begin{array}{ccc} 1 - w(U_j) &{} \quad \text { for } &{}i = j \\ w(U_j) &{} \quad \text { for } &{} i \ne j \end{array} \right. , \end{aligned}$$\end{document}

and

(28)

\begin{matrix} w (U_{j}) = \frac{1 / var (U_{j})}{\sum_{k = 1}^{m} 1 / var (U_{k})} \end{matrix}

Then simultaneously testing for DIF on both the slope and intercept of item i can proceed via a Wald test of $Q_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Q_i$$\end{document} . The availability of Eqs. (24) through (28) implies that this test does not require simultaneous estimation of $θ_{RD}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _{RD}$$\end{document} and $σ_{RD}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{RD}$$\end{document} . Thus, the Wald test of $Q_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Q_i$$\end{document} can be conveniently implemented as a follow-up to estimation of the individual IRT scaling parameters, as described in the previous section.

6. Numerical Examples

This section presents two simulation studies illustrating the R-DIF procedure. The first addresses its breakdown in the presence of worst-case DIF. The second addresses statistical power when only a single item exhibits DIF. The simulation studies are followed by a real data example from cross-cultural human development. The example data are publically available from UNICEF.Footnote 1 The samples used in the illustration along with R code for running the simulations and conducting the analyses are available at github.com/peterhalpin/robustDIF/tree/Halpin2022. The R package mirt (Chalmers, Reference Chalmers2012) was used for estimation of IRT models, difR was used for the Mantel–Haenzal (MH) test (Magis et al., Reference Magis, Béland, Tuerlinckx and De Boeck2010), and GPCMlasso was used to illustrate a regularization-based approach (Schauberger and Mair, Reference Schauberger and Mair2020). A nominal type I error rate of .05 was used for all procedures, except the regularization-based approach, which selects the tuning parameter by minimizing BIC.

6.1. Simulation 1: Breakdown

Data were generated using the 2PL IRT model in Eq. (5). The focal factor of the study was the proportion of items with DIF, which ranged from 0 to 1/2. Items with DIF were simulated by applying a bias of $Δ = . 5$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta =.5$$\end{document} to the item difficulty parameters (not intercepts) and items with DIF were randomly selected in each replication. The other design factors are summarized in Table 1. Note that the simulation study is intended to reflect the worse-case bias that can be induced for a given maximum effect size $Δ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta $$\end{document} . The rationale for this design was discussed in the section of this paper entitled “Robustness”.

Table 1 Summary of simulation 1 design.

DIF on item intercepts was simulated using $b_{1 i} + . 5$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{1i} +.5$$\end{document} for randomly selected values of i in each replication.

In each simulation condition, the performance of R-DIF was compared to two traditional methods of DIF analysis, the MH procedure (MH; Dorans and Holland, Reference Dorans, Holland, Holland and Wainer1993) and the likelihood ratio test (LRT Thissen et al., Reference Thissen, Steinberg, Wainer, Holland and Wainer1993), as well as a more recent method that uses regularization (GPCM-lasso; Schauberger and Mair, Reference Schauberger and Mair2020). The MH and GPCM-lasso methods assume uniform DIF, but R-DIF and LRT do not require uniform DIF and were implemented without this assumption. Both MH and LRT were estimated using two-stage purification and refinement. Simulation conditions often resulted in all or no items in the anchor set, in which case purification was not performed. It can be noted that many other choices of anchor items are available (see Kopf et al., 2015a), and the results of this simulation study do not seek to address those other choices.

The main results are summarized in Fig. 1. The light blue line reports the R-DIF flagging procedure computed using the true scaling parameter. It can be viewed as a check on the correctness of the R-DIF procedure. The dark blue line shows the R-DIF flagging procedure implemented during the estimation of the IRT scaling parameter. It is seen to provide acceptable type I error control until 7/16 of the items exhibit worst-case DIF, which is just shy of its theoretical breakdown point of 1/2 biased items. It also maintains its level of statistical power quite well, up to its breakdown point. The stand-alone Wald test for the item intercept (part (b) of Theorem 1) led to identical results and is not reported. The stand-alone Wald test for simultaneously testing the item slope and intercept is also not reported; it was slightly more powerful than the R-DIF flagging procedure, but also led to slightly more type I errors.

The MH procedure had better type I error control and power than LRT, and the regularization-based approach had false-positive rates similar to LRT and power similar to MH. The main observation to be made is that the R-DIF procedure had comparable performance to these alternatives when relatively few items exhibited DIF, and maintained its level of performance much better when larger proportions of items exhibited DIF, up until its theoretical breakdown point of 1/2. When comparing methods, it is also relevant to emphasize that, unlike MH and GPCM-lasso, R-DIF does not require the assumption of uniform DIF—in this regard, LRT is the only direct comparator.

Figure 1 Type I error rates and statistical power for each of four methods: “Lasso” = GPCM lasso; “MH” = Mantel–Haenszel; “LRT” = LRT; “RDIF.flag” = the proposed method.“RDIF.true” denotes the proposed method computed using the data generating value of $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} .

Figure 2 provides another perspective on the breakdown of the R-DIF procedure. The figure shows that the breakdown of the R-DIF test can be explained in terms of the breakdown of the R-DIF estimator of the IRT scaling parameter.

Figure 2 Distribution of ${\tilde{θ}}_{RD}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{\theta }}_{RD}$$\end{document} in each simulation condition. “N.DIF” denotes the number of items with DIF. The data-generating value was 0.5 in each condition.

6.2. Simulation 2: Statistical Power

Data were again generated using the 2PL IRT model in Eq. (5). This time only a single item exhibited DIF, and the degree of DIF was varied on both the item intercept and item slope. The rationale for limiting consideration to DIF in only a single item is twofold. First, Fig. 1 shows that R-DIF maintains its size and power quite well when additional items with the same direction and magnitude of DIF are added. Therefore, consideration of DIF in only a single item provides a reasonable summary of the performance of R-DIF under these more general conditions. Second, focusing on DIF in a single item allows for the statistical power of R-DIF to be fairly benchmarked against traditional methods. In particular, LRT allows for consideration of DIF in both item parameters separately or together, so it is a suitable comparator for R-DIF. But, as shown in Fig. 1, LRT does not perform well when additional items exhibit DIF. Thus, limiting DIF to a single item provides a fair way to compare the statistical power of the two methods.

The simulation design is summarized in Table 2. The focal factors of the study were the sample size per group ( $n_{0} = n_{1} \in {200, 350, 500}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n_0 = n_1 \in \{200, 350, 500 \}$$\end{document} ) and the type of DIF (intercept only, slope only, or both), which were crossed to create nine simulation conditions. In each condition, type I error rates and statistical power for R-DIF and LRT were compared, for tests of the intercept only, slope only, and both parameters. For the intercept and slope, R-DIF was implemented by flagging items during estimation. For the two-parameter test, R-DIF was implemented using a follow-up test after estimating each scale parameter separately.

Table 2 Summary of simulation 2 design.

$Δ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta $$\end{document} denotes additive DIF applied to the item difficulty, $Γ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Gamma $$\end{document} denotes multiplicative DIF applied to the item slope.

Some other aspects of the simulation design warrant mention. The simulation used 10 items, which is the same number as in the real data example reported below. Impact was allowed to vary randomly in each replication, which was intended to make the simulation more realistic. For each type of DIF, the magnitude of DIF was determined so that the non-compensatory DIF index (NCDIFI; e.g., Raju et al., Reference Raju, van der Linden and Fleer1995, Eq. 11) was approximately equal to 0.1. To compute the NCDIFI, the reference distribution for the latent trait was standard normal and the reference item had a slope of 1 and intercept of 0. DIF on the item intercept was additive and governed by the parameter $Δ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta $$\end{document} applied to the item difficulty, whereas DIF on the item slope was multiplicative and governed by the parameter $Γ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Gamma $$\end{document} . The values of these parameters are given in Table 2.

The results are summarized in Fig. 3. Focusing first on the top row of the figure, it can be seen that all testing procedures maintained the nominal type I error rate of .05 reasonably well in all conditions. In particular, note that the R-DIF test for item intercepts was not sensitive to DIF in the item slopes, and vice versa. With a few minor exceptions, the LRT procedure for both parameters had the highest type I error rate in all conditions, and was as high as .094 for the largest sample size ( $n = 500$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n = 500$$\end{document} per group) in the Slope Only condition. The overall conclusion is that the type I error rate control of R-DIF was comparable to that of LRT.

Figure 3 Panels denote the type of DIF (columns) and decision rates (rows). “Test” indicates the type of test conducted, with “both” denoting the test of both parameters, “intercept” denoting a test of the intercepts only, and “slope” denoting a test of the slopes only. LRT denotes likelihood ratio tests and RDIF denotes the R-DIF procedure. The nominal type I error (false positive) rate of all tests was .05.

Turning next to the bottom row of Fig. 3, it can be seen that the R-DIF tests were less powerful than the corresponding LRT test, with only a few minor exceptions. The power differential between R-DIF and LRT was most pronounced when there was DIF on the item slope only (middle panel). In particular, the R-DIF procedure cannot be recommended to test DIF of item slopes with sample sizes less than $n = 350$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n = 350$$\end{document} per group. In each condition, the power differential between R-DIF and LRT decreased with sample size, suggesting that the differential will become negligible with larger samples.

6.3. Empirical Example: Assessing Human Development Across Countries

This section illustrates the use of R-DIF with data from the UNICEF’s Early Childhood Development Index (ECDI).Footnote 2 The ECDI is a caregiver-reported household survey intended to provide internationally comparable data about the percentage of children aged 24–59 months who are developmentally on track in health, learning, and psychosocial well-being, by sex. Data were collected via household surveys in Fiji and Vietnam. In both surveys, sample frames consisting of a list of households with children between 24 and 59 months were used to design representative samples using probabilistic sampling. The illustration focuses on $m = 10$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m = 10$$\end{document} ECDI items on the learning domain, which are summarized in the second column of Table 3, and children aged 48–59 months ( $n_{0} = 412$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n_0 = 412~$$\end{document} and $n_{1} = 978$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n_1 = 978$$\end{document} in Fiji and Vietnam, respectively). IRT models were estimated using probability-based sampling weights.

Figure 4 plots the function $R (θ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R(\theta )$$\end{document} . As noted in the section of this paper entitled “Estimation”, $R (θ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R(\theta )$$\end{document} is minimized by the R-DIF estimator of $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} . The presence of multiple local minima with approximately the same value would indicate potential problems when estimating the IRT scaling parameters and interpreting which items exhibit DIF. However, the figure shows that $R (θ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R(\theta )$$\end{document} had clear global minima for both scaling parameters, and the R-DIF procedure converged to these global values.

Figure 4 Plots of the $R (θ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R(\theta )$$\end{document} minimized by the R-DIF estimator.

Table 3 reports the three types of test statistics available from the R-DIF procedure. Using a type I error rate of .05, it was found that six items exhibited DIF on the item intercepts and two items exhibited DIF on the item slopes. The R-DIF test of both parameters led to the conclusion that a total of three items did not exhibit DIF on either parameter. Although the proportion of items with DIF on the intercepts exceeded the theoretical breakdown point of 1/2, DIF was not consistently in the same direction. Combined with the clear global minimum in Fig. 4, this suggests that breakdown of the R-DIF procedure was not a concern in the present analysis.

For comparison, LRT using two-step purification and refinement with a type I error rate of .05 led to the conclusion that all items except 4 and 7 exhibited DIF on their intercepts, all items except 4, 6, and 8 exhibited DIF on their slopes, and the test of both parameters identified all items as exhibiting DIF. The data and code for these analyses are provided at https://github.com/peterhalpin/robustDIF/tree/Halpin2022.

Table 3 R-DIF tests of the ECDI learning items.

“z” denotes the z-test of individual parameters.“ $χ^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} ” denotes the chi-square test of both parameters together and has 2 degrees of freedom. The p values are rounded to 2 decimal places and values less than .05 are bolded. The p values were not adjusted for multiple comparisons.

7. Discussion

This paper has introduced a method for DIF analysis that is intended for use when a’ priori knowledge about anchor items is not available and when many items on an assessment may exhibit DIF. The overall idea is to approach DIF as a problem of outlier detection in IRT-based scaling with the CINEG design. This approach is congenial to M-estimation of a location parameter, with new results providing the asymptotic distribution of the IRT scaling parameters, as well as an asymptotic test of DIF, under the joint null hypothesis that none of the items exhibit DIF. These results were used to develop a highly robust redescending M-estimator that simultaneously provides an estimate of IRT scale parameters and an asymptotic test of DIF, which was referred to as the R-DIF procedure.

Using the joint null hypothesis that none of the items exhibit DIF to derive asymptotic results about R-DIF may, at first glance, seem to invite the same criticism of logical circularity that has been raised against traditional methods. However, to make use of the joint null hypothesis, the R-DIF procedure requires only that a suitable estimate of the IRT scaling parameters is available. Theoretical results showed that the bias of the R-DIF estimate of the IRT scaling parameters remains bounded so long as fewer than 1/2 of the items on assessment exhibit “worst-case” DIF (i.e., biased in the same direction and by the same magnitude).

The robustness of R-DIF was also illustrated by data simulations, which showed that R-DIF maintains acceptable type I error control and statistical power so long as fewer than 1/2 of the item on an assessment exhibit worst-case DIF. While the performance of the comparison methods deteriorated incrementally as more items with DIF were added, R-DIF maintained its initial size and power until approaching its theoretical breakdown point of 1/2. A second simulation study showed that the robustness of R-DIF comes at a cost of reduced statistical power compared to the likelihood ratio test when only a single item exhibits DIF. Thus, R-DIF is most suitable with larger sample sizes ( $n \geq 350$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n \ge 350$$\end{document} per group). An empirical example from cross-cultural human development illustrated the use of R-DIF in a context where many assessment items exhibited DIF, and led to substantively different conclusions about DIF compared to the likelihood ratio test.

This paper focused on the 2PL model in two independent groups. However, some features of the R-DIF procedure make it suitable for extension. First, the main results presented in this paper trivially extend to other unidimensional psychometric models that (a) can be parameterized in slope-intercept form and (b) have item parameter estimates whose asymptotic distribution is known. This includes, for example, the unidimensional linear factor model and the graded response model. Extensions to wider classes of models (e.g., multidimensional) are less obvious. Second, R-DIF can be implemented using separate calibrations of the focal model in the target populations. Thus it is scalable to situations with many groups, which is especially relevant in cross-cultural settings. The results presented in this paper can be directly applied to pairwise comparisons among multiple groups, although it would be preferable to consider alternative approaches (e.g., sum-to-zero contrasts among groups). A third line of future research is longitudinal settings (i.e., dependent groups). While the asymptotic variances of the IRT scaling parameters given in Eqs. (14) and (23) used the assumption that the groups were independent, the main results are agnostic to the specific structure of the asymptotic covariance matrix of the item parameter estimates.

There are some deeper limitations of the R-DIF procedure that could also be addressed in future research. Most obviously, the methodology relies on asymptotics, which may not always be appropriate. In principle, this limitation can be overcome via bootstrapping, although this complicates the path to analytic results. Second, the concept of an “effect size” (e.g., Sireci and Rios, Reference Sireci and Rios2013; Wainer, Reference Wainer, Holland and Wainer1993), residual (e.g., Karabatsos, Reference Karabatsos2000; Haberman, 2009), or related notions of item misfit (e.g., Rost and von Davier, Reference Rost and von Davier1994; Yamamoto et al., 2013) were not addressed in this paper. The focus of the R-DIF procedure is to flag items with DIF, but this leaves open the question of how to quantify the degree of DIF and its consequences for decisions to be made based on test data (e.g., Chalmers et al., Reference Chalmers, Counsell and Flora2016; Gonzalez and Pelham, Reference Gonzalez and Pelham2021). Developing effect sizes for R-DIF remains an important avenue of future research. Another limitation concerns the notion of a breakdown point, which provides only a crude characterization of the robustness of an estimator under unspecified types of data contamination. Moving forward, it may be useful to develop more specialized concepts of breakdown that reflect theoretically motivated configurations of DIF and which quantify the consequences of DIF in terms of (finite) degrees of item misfit.

In conclusion, this paper has shown that reframing DIF as a problem in robust scaling can provide a satisfactory resolution to long-standing methodological issues concerning the circular nature of DIF. Consequently, the proposed methodology is especially suited to research settings in which many items may exhibit DIF and anchor items cannot be reliably identified ahead of time.

Appendix

Proof of Theorem 1

The proof is obtained via the Delta method (e.g., van der Vaart, Reference van der Vaart1998, Chap. 3), which requires only assumptions A1 and A4. Assumptions A2 and A3 are used to obtain the distributions of $\tilde{θ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{\theta }}$$\end{document} and $Y_{i} - \tilde{θ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i - {\tilde{\theta }}$$\end{document} under the joint null hypothesis that none of the item intercepts exhibit DIF. In Eq. (32) it is seen that Assumption A4 is implied by A2 and A3, so that it is not required to obtain the null distributions (but is required for the non-null distributions).

The results are organized as follows. First the general (i.e., non-null) asymptotic distribution of $\tilde{θ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{\theta }}$$\end{document} is derived. Then its null distribution is obtained for any choice of $s_{i} > 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_i > 0$$\end{document} in Eq. (15). Finally, the null distribution for $s_{i} = var (Y_{i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_i = \text {var}(Y_i)$$\end{document} is provided. This is followed by an abbreviated version of these same steps for $Y_{i} - \tilde{θ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i - {\tilde{\theta }}$$\end{document} .

For any transformation of the MLEs of the item parameters $g = g (\hat{ν})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g= g(\hat{\varvec{\nu }})$$\end{document} satisfying assumptions A1 and A4, the general form of the result is

(29)

\begin{matrix} \sqrt{n} (g - g (ν)) \overset{d}{\to} N (0, var (g)) \end{matrix}

where $n = n_{0} + n_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n = n_0 + n_1$$\end{document} , $n_{1} / n_{0} = c$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n_1 / n_0 = c$$\end{document} for $c \in (0, \infty)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c \in (0, \infty )$$\end{document} , and

(30)

\begin{matrix} var (g) = \nabla g {(ν)}^{T} cov (\hat{ν}) \nabla g (ν) . \end{matrix}

For $g = \tilde{θ} = θ (\hat{ν})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g= {\tilde{\theta }} = \theta (\hat{\varvec{\nu }})$$\end{document} , the gradient can be obtained by applying the implicit function theorem to Eq. (15) (which also requires A4):

\begin{matrix} \nabla θ (ν) = - \frac{\partial Ψ}{\partial ν} {(\frac{\partial Ψ}{\partial θ})}^{- 1} . \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \nabla \theta (\varvec{\nu }) = - \frac{\partial \Psi }{\partial \varvec{\nu }} \left[ \frac{\partial \Psi }{\partial \theta }\right] ^{-1}. \end{aligned}$$\end{document}

The required partial derivatives are:

\begin{matrix} \frac{\partial Ψ}{\partial ν} & = \sum_{i = 1}^{m} ψ^{'} (U_{i}, (ν)) \times \nabla Y_{i} (ν) / s_{i} and \frac{\partial Ψ}{\partial θ} = - \sum_{i = 1}^{m} ψ^{'} (U_{i}, (ν)) / s_{i} . \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{\partial \Psi }{\partial \varvec{\nu }}&= \sum _{i=1}^{m} \psi ' \left( U_i(\varvec{\nu })\right) \times {\nabla Y_i(\varvec{\nu })} / {s_i} \quad \quad \text {and} \quad \quad \frac{\partial \Psi }{\partial \theta } = - \sum _{i=1}^{m} \psi ' \left( U_i(\varvec{\nu })\right) / {s_i}. \end{aligned}$$\end{document}

Substituting these results into Eq. (30) gives

(31)

\begin{matrix} var (\tilde{θ}) = \sum_{i = 1}^{m} w_{i}^{2} \nabla Y_{i} {(ν)}^{T} cov (\hat{ν}) \nabla Y_{i} (ν) = \sum_{i = 1}^{m} w_{i}^{2} var (Y_{i}) \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \text {var} ({\tilde{\theta }}) = \sum _{i = 1}^{m} w^2_i \, \nabla Y_i(\varvec{\nu })^T \, \text {cov}(\hat{\varvec{\nu }}) \, \nabla Y_i(\varvec{\nu }) = \sum _{i = 1}^{m} w^2_i \, \text {var}(Y_i) \end{aligned}$$\end{document}

with weights

(32)

\begin{matrix} w_{i} = \frac{ψ^{'} (U_{i} (ν)) / s_{i}}{\sum_{j = 1}^{m} ψ^{'} (U_{j} (ν)) / s_{j}} . \end{matrix}

Equations (29), (31) and (32) provide the asymptotic distribution of $\tilde{θ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{\theta }}$$\end{document} for a relatively general specification of $ψ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi $$\end{document} (only Assumptions A1 and A4 have been used so far).

Next, we obtain the distribution of $\tilde{θ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{\theta }}$$\end{document} under the joint null hypothesis that $Y_{i} (ν) = θ_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_i(\varvec{\nu }) = \theta _0$$\end{document} for $i = 1, \dots, m$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i = 1, \dots , m$$\end{document} . First it is shown that $θ (ν) = θ_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta (\varvec{\nu }) = \theta _0$$\end{document} . Substituting into Eq. (15) yields

(33)

\begin{matrix} Ψ (θ_{0}, θ) = \sum_{i = 1}^{m} ψ (\frac{θ_{0} - θ}{s_{i}}) = 0 . \end{matrix}

By assumption A2, $θ = θ_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta = \theta _0$$\end{document} is seen to be the unique solution of $Ψ (θ_{0}, θ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Psi (\theta _0, \theta )$$\end{document} in a non-empty neighborhood of $θ_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _0$$\end{document} . Thus, under the joint null hypothesis, $θ (ν) = θ_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta (\varvec{\nu }) = \theta _0$$\end{document} and $U_{i} (ν) = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$U_i(\varvec{\nu }) = 0$$\end{document} .

To obtain $var (\tilde{θ})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {var}({\tilde{\theta }})$$\end{document} under the joint null hypothesis, note that $ψ^{'} (U_{i} (ν)) = ψ^{'} (0) = c$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi '(U_i(\varvec{\nu })) = \psi '(0) = c$$\end{document} and, by assumption A3, $c \neq 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c \ne 0$$\end{document} . So, Assumption A4 is no longer required, and in place of Eq. (32) we have the “null weights”:

(34)

\begin{matrix} w_{i} = \frac{1 / s_{i}}{\sum_{j = 1}^{m} 1 / s_{j}} . \end{matrix}

Thus, for any choice of $s_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_i$$\end{document} , the joint null hypothesis implies

\begin{matrix} \sqrt{n} (\tilde{θ} - θ_{0}) \overset{d}{\to} N (0, {var}_{0} (\tilde{θ})) \end{matrix}

with

(35)

\begin{matrix} {var}_{0} (\tilde{θ}) = \sum_{i = 1}^{m} {(\frac{1 / s_{i}}{\sum_{j = 1}^{m} 1 / s_{j}})}^{2} var (Y_{i}) . \end{matrix}

The next step is to choose $s_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_i$$\end{document} . As mentioned in the preamble to Theorem 1, the goal is to choose the weights to minimize ${var}_{0} (\tilde{θ})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {var}_0 ({\tilde{\theta }}) $$\end{document} . Re-writing Eq. (35) using $v_{i} = w_{i} var (Y_{i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$v_i = w_i \text {var}(Y_i)$$\end{document} and applying the weighted power means inequality (e.g., Cvetkovski Reference Cvetkovski2012, Chap. 3) gives the following lower bound for ${var}_{0} (\tilde{θ})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {var}_0 ({\tilde{\theta }}) $$\end{document} :

(36)

\begin{matrix} {var}_{0} (\tilde{θ}) = \sum_{i = 1}^{m} w_{i} v_{i} \geq {(\sum_{i = 1}^{m}, (w_{i} / v_{i}))}^{- 1} = {(\sum_{i = 1}^{m} 1 / var (Y_{i}))}^{- 1} \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \text {var}_0 ({\tilde{\theta }}) = \sum _{i = 1}^{m} w_i v_i \ge \left( \sum _{i = 1}^{m} (w_i / v_i)\right) ^{-1} = \left( \sum _{i=1}^{m} 1/ \text {var}(Y_i)\right) ^{-1} \end{aligned}$$\end{document}

It can be verified that equality is obtained by setting $s_{i} = var (Y_{i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_i = \text {var}(Y_i)$$\end{document} in Eq. (35) which proves part (a) of the theorem. (Incidentally, this is also the variance of the maximum likelihood estimate of $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} , which can also be readily verified).

Turning now to part (b), consider the case where $g = Y_{i} - \tilde{θ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g = Y_i - \tilde{\theta }$$\end{document} and $\tilde{θ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{\theta }}$$\end{document} is estimated as just described. Following the same steps outlined above shows that the asymptotic distribution has variance

(37)

\begin{matrix} var (Y_{i} - \tilde{θ}) = \sum_{j = 1}^{m} {\tilde{w}}_{j}^{2} var (Y_{j}) \end{matrix}

with

(38)

\begin{matrix} {\tilde{w}}_{j} = (\begin{matrix} 1 - w_{j} & for & i = j \\ w_{j} & for & i \neq j \end{matrix}) \end{matrix}

and $w_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_j$$\end{document} given by the “general” weights in Eq. (32). The null distribution is

(39)

\begin{matrix} \sqrt{n} (Y_{i} - \tilde{θ}) \overset{d}{\to} N (0, {var}_{0} (Y_{i} - \tilde{θ})) \end{matrix}

with ${var}_{0} (Y_{i} - \tilde{θ})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {var}_0(Y_i - {\tilde{\theta }})$$\end{document} obtained from Eq. (37) by using the null weights in Eq. (34). Finally, setting $s_{i} = var (Y_{i})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_i = \text {var}(Y_i)$$\end{document} in Eq. (34) and substituting into Eq. (37) yields

(40)

\begin{matrix} {var}_{0} (Y_{i} - \tilde{θ}) & = var (Y_{i}) - 2 w_{i} var (Y_{i}) + \sum_{j = 1}^{m} w_{j}^{2} var (Y_{j}) \\ = var (Y_{i}) - 2 {var}_{0} (\tilde{θ}) + {var}_{0} (\tilde{θ}) . \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \text {var}_0 (Y_i - {\tilde{\theta }})&= \text {var}(Y_i) - 2 w_i\, \text {var}(Y_i) + \sum _{j= 1}^{m} w_j^2 \, \text {var}(Y_j) \nonumber \\&= \text {var}(Y_i) - 2\, \text {var}_0 ({\tilde{\theta }}) + \text {var}_0 ({\tilde{\theta }}). \end{aligned}$$\end{document}

Equations (39) and (40) provide part (b) of the theorem.

Proof of Theorem 2

Under the assumptions in the theorem, Eq. (20) becomes

(41)

\begin{matrix} θ^{(1)} = θ^{(0)} + \frac{\sum_{i} ψ (\frac{Y_{i} - θ^{(0)}}{τ_{i}^{(0)}})}{\sum_{i} ψ^{'} (\frac{Y_{i} - θ^{(0)}}{τ_{i}^{(0)}}) / τ_{i}^{(0)}} \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \theta ^{(1)} = \theta ^{(0)} + \frac{\sum _i \psi \left( \frac{Y_i - \theta ^{(0)}}{\tau _i^{(0)}} \right) }{\sum _i \psi ' \left( \frac{Y_i - \theta ^{(0)}}{\tau _i^{(0)}} \right) / \tau _i^{(0)}} \end{aligned}$$\end{document}

with $τ_{i}^{(0)} = τ_{i} (θ^{(0)})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _i^{(0)} = \tau _i(\theta ^{(0)})$$\end{document} as given in Eq. (14). Letting the ratio in Eq. (41) be denoted by R, the theorem requires showing that |R| is bounded away from infinity whenever $| θ^{(0)} |$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|\theta ^{(0)}|$$\end{document} is. The numerator of R is bounded away from infinity by assumption A6. The denominator is bounded away from zero for finite $θ^{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta ^0$$\end{document} by the assumption that $θ^{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta ^0$$\end{document} is not a stationary point of $Ψ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Psi $$\end{document} . These conditions apply to many M-estimators, but leave open the possibility that R will diverge due to $τ_{i}^{(0)} \to \infty$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _i^{(0)} \rightarrow \infty $$\end{document} . In the present case, this consideration is eliminated by Eq. (14) which shows that $τ_{i} (θ) = O (θ^{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _i(\theta ) = O(\theta ^2)$$\end{document} .

Footnotes

The author would like to thank Dr. Matthias von Davier for helpful comments that improved the proof of Theorem 1.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

¹ https://mics.unicef.org/surveys.

² https://data.unicef.org/resources/early-childhood-development-index-2030-ecdi2030/.

References

Angoff, W. (1982). Use of difficulty and discrimination indices for detecting item bias. In Berk, R. (Eds), Handbook of methods for detecting test bias, Baltimore, MA: The Johns Hopkins Press 96–116.Google Scholar

Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In Holland, P. W., Wainer, H. (Eds), Differential item functioning, Hillsdale, NJ: Lawrence Earlbaum Associates 3–23.Google Scholar

Asparouhov, T., Muthén, B. (2014). Multiple-group factor analysis alignment. Structural Equation Modeling: A Multidisciplinary Journal, 21(4), 495–508.CrossRef Google Scholar

Bechger, T. M., Maris, G. (2015). A statistical test for differential item pair functioning. Psychometrika, 80, 317–340.CrossRef Google Scholar PubMed

Belzak, W. C. M., Bauer, D. J. (2020). Improving the assessment of measurement invariance: Using regularization to select anchor items and identify differential item functioning. Psychological Methods, 25, 673–690.CrossRef Google Scholar PubMed

Bock, R. D., Gibbons, R. D. (2021). Item response theory, Hoboken, NJ: Wiley.CrossRef Google Scholar

Chalmers, R. P. (2012). Mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29.CrossRef Google Scholar

Chalmers, R. P., Counsell, A., Flora, D. B. (2016). It might not make a big DIF: Improved differential test functioning statistics that account for sampling variability. Educational and Psychological Measurement, 76(1), 114–140.CrossRef Google Scholar

Cvetkovski, Z. (2012). Inequalities: Theorems, techniques and selected problems, Cham: Springer.CrossRef Google Scholar

Doebler, A. (2019). Looking at DIF from a new perspective: A structure-based approach acknowledging inherent indefinability. Applied Psychological Measurement, 43(4), 303–321.CrossRef Google Scholar PubMed

Dorans, N. J., Holland, P. W. (1993). DIF detection and description: Mantel–Haenszel and standardization. In Holland, P. W., Wainer, H. (Eds), Differential item functioning, Hillsdale, NJ: Lawrence Erlbaum Associates 35–66.Google Scholar

Gonzalez, O., Pelham, W. E. (2021). When does differential item functioning matter for screening? A method for empirical evaluation. Assessment, 28(2), 446–456.CrossRef Google Scholar PubMed

Haberman, S. J. (2009). Use of generalized residuals to examine goodness of fit of item response models. ETS reseach report RR-09-15.CrossRef Google Scholar

He, Y. (2013). Robust scale transformation methods in IRT true score equating under common-item nonequivalent groups design, Ann Arbor: ProQuest LLC.CrossRef Google Scholar

He, Y., Cui, Z. (2020). Evaluating robust scale transformation methods with multiple outlying common items under IRT true score equating. Applied Psychological Measurement, 44(4), 296–310.CrossRef Google Scholar PubMed

He, Y., Cui, Z., Osterlind, S. J. (2015). New robust scale transformation methods in the presence of outlying common items. Applied Psychological Measurement, 39(8), 613–626.CrossRef Google Scholar PubMed

Huber, P. J. (1964). Robust estimation of a location parameter. Annals of Mathematical Statistics, 35(1), 73–101.CrossRef Google Scholar

Huber, P. J. (1984). Finite sample breakdown of M- and P-estimators. Annals of Statistics, 12, 119–126.CrossRef Google Scholar

Huber, P. J., Ronchetti, E. (2009). Robust statistics, 2Hoboken, NJ: Wiley.CrossRef Google Scholar

Karabatsos, G. (2000). A critique of Rasch residual fit statistics. Journal of Applied Measurement, 1(2), 152–176.Google Scholar PubMed

Kolen, M. J., Brennan, R. L. (2014). Test equating, scaling, and linking, New York, NY: Springer.CrossRef Google Scholar

Kopf, J., Zeileis, A., & Strobl, C. (2015a). Anchor selection strategies for DIF analysis: Review, assessment, and new approaches. Educational and Psychological Measurement, 75(1), 22–56.CrossRef Google Scholar

Kopf, J., Zeileis, A., & Strobl, C. (2015b). A framework for anchor methods and an iterative forward approach for DIF detection. Applied Psychological Measurement, 39(2), 83–103.CrossRef Google Scholar

Li, G., Zhang, J. (1998). Breakdown properties of location M-estimators. The Annals of Statistics, 26(3), 1170–1189.CrossRef Google Scholar

Lord, F. M. (1980). Applications of item response theory to practical testing problems, New York: Routledge.Google Scholar

Magis, D., Béland, S., Tuerlinckx, F., De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42(3), 847–862.CrossRef Google Scholar

Magis, D., Tuerlinckx, F., De Boeck, P. (2015). Detection of differential item functioning using the lasso approach. Journal of Educational and Behavioral Statistics, 40(2), 111–135.CrossRef Google Scholar

Maronna, R. A., Martin, R. D., Yohai, V. J., Salibián-Barrera, M. (2019). Robust statistics: Theory and methods (with R), 2Hoboken, NJ: Wiley.Google Scholar

Mellenbergh, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7(2), 105–118.CrossRef Google Scholar

R Core Team. (2022). R: A language and environment for statistical computing.Google Scholar

Raju, N. S., van der Linden, W. J., Fleer, P. F. (1995). IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19(4), 353–368.CrossRef Google Scholar

Robitzsch, A., Lüdtke, O. (2023). Why full, partial, or approximate measurement invariance are not a prerequisite for meaningful and valid group comparisons. Structural Equation Modeling: A Multidisciplinary Journal, 30(6), 859–870.CrossRef Google Scholar

Rost, J., von Davier, M. (1994). A conditional item-fit index for Rasch models. Applied Psychological Measurement, 18(2), 171–182.CrossRef Google Scholar

Rousseeuw, P. J., Leroy, A. M. (1987). Robust regression and outlier detection, New York: Wiley.CrossRef Google Scholar

Schauberger, G., Mair, P. (2020). A regularization approach for the detection of differential item functioning in generalized partial credit models. Behavior Research Methods, 52(1), 279–294.CrossRef Google Scholar PubMed

Sireci, S. G., Rios, J. A. (2013). Decisions that make a difference in detecting differential item functioning. Educational Research and Evaluation, 19 2–3170–187.CrossRef Google Scholar

Stenhaug, B., Frank, M. C., & Domingue, B. (2021). Treading carefully: Agnostic identification as the first step of detecting differential item functioning. Preprint, PsyArXiv.CrossRef Google Scholar

Stocking, M. L., Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7(2), 201–210.CrossRef Google Scholar

Strobl, C., Kopf, J., Kohler, L., von Oertzen, T., Zeileis, A. (2021). Anchor point selection: Scale alignment based on an inequality criterion. Applied Psychological Measurement, 45(3), 214–230.CrossRef Google Scholar

Thissen, D., Steinberg, L., Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In Holland, P. W., Wainer, H. (Eds), Differential item functioning, Hillsdale, NJ: Lawrence Erlbaum Associates 67–113.Google Scholar

van der Linden, W. J. (2016). Handbook of item response theory, Boca Raton, FL: CRC Press.CrossRef Google Scholar

van der Vaart, A. W. (1998). Asymptotic statistics, Cambridge: Cambridge University Press.CrossRef Google Scholar

von Davier, M. & Bezirhan, U. (2022). A robust method for detecting item misfit in large-scale assessments. Educational and Psychological Measurement, 00131644221105819.Google Scholar

Wainer, H. (1993). Model-based standardized measurement of an item’s differential impact. In Holland, P. W., Wainer, H. (Eds), Differential item functioning, Hillsdale, NJ: Lawrence Erlbaum Associates 123–135.Google Scholar

Wang, W., Liu, Y., Liu, H. (2022). Testing differential item functioning without predefined anchor items using robust regression. Journal of Educational and Behavioral Statistics, 47(6), 666–692.CrossRef Google Scholar

Yamamoto, K., Khorramdel, L., & von Davier, M. (2013). Scaling PIAAC cognitive data. In OECD (Ed.), Technical report of the survey of adult skills (PIAAC) (pp. 17.1–17.34). Paris: OECD Publishing.Google Scholar

Yohai, V. J. (1987). High breakdown-point and high efficiency robust estimates for regression. Annals of Statistics, 15(2), 642–656.CrossRef Google Scholar

Yuan, K.-H., Liu, H., Han, Y. (2021). Differential item functioning analysis without a priori information on anchor items: QQ plots and graphical test. Psychometrika, 86, 345–377.CrossRef Google Scholar PubMed

Table 1 Summary of simulation 1 design.

Figure 1 Type I error rates and statistical power for each of four methods: “Lasso” = GPCM lasso; “MH” = Mantel–Haenszel; “LRT” = LRT; “RDIF.flag” = the proposed method.“RDIF.true” denotes the proposed method computed using the data generating value of θ\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta $$\end{document}.

Figure 2 Distribution of θ~RD\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\tilde{\theta }}_{RD}$$\end{document} in each simulation condition. “N.DIF” denotes the number of items with DIF. The data-generating value was 0.5 in each condition.

Table 2 Summary of simulation 2 design.

Figure 4 Plots of the R(θ)\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$R(\theta )$$\end{document} minimized by the R-DIF estimator.

Table 3 R-DIF tests of the ECDI learning items.

Article contents

Differential Item Functioning via Robust Scaling

Abstract

Keywords

1. The Circular Nature of DIF: Redux

2. The R-DIF Procedure

2.1. Step 1: Setting Up the Scaling Problem

2.2. Step 2: Choosing the Weights s i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_i$$\end{document}

Theorem 1

2.3. Step 3: Choosing the Loss Function ψ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi $$\end{document}

2.4. Summary

3. Robustness

3.1. FSBP of the R-DIF Procedure

Theorem 2

3.2. Breakdown, IRT-Based Scaling, and DIF

4. Estimation

5. Extensions

6. Numerical Examples

6.1. Simulation 1: Breakdown

6.2. Simulation 2: Statistical Power

6.3. Empirical Example: Assessing Human Development Across Countries

7. Discussion

Appendix

Proof of Theorem 1

Proof of Theorem 2

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests

2.2. Step 2: Choosing the Weights $s_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_i$$\end{document}

2.3. Step 3: Choosing the Loss Function $ψ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi $$\end{document}