Search

The Hidden Measurement Crisis in Criminology

Procedural Justice as a Case Study
Amanda Graham, Francis T. Cullen, Bruce G. Link
Published online:

03 March 2025

Print publication:

27 March 2025
- Element
- - Get access
    
    Buy the print Element
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
The field of criminology is limited by a 'hidden' measurement crisis. It is hidden because scholars either are not aware of the shortcomings of their measures or have implicitly agreed that scales with certain properties merit publication. It is a crisis because the approaches used to construct measures do not employ modern systematic psychometric methods. As a result, the degree to which existing measures have methodological limitations is unknown. The purpose of this Element is to unmask this hidden crisis and provide a case study demonstrating how to build a measure of a prominent criminological construct through modern systematic psychometric methods. Using multiple surveys and item response theory, it develops a ten-item scale of procedural justice in policing. This can be used in primary research and to adjudicate existing measures. The goal is to reveal the nature of the field's measurement crisis and show a strategy for solving it.

Multidimensional Latent Space Item Response Models: A Note on the Relativity of Conditional Dependence
Inhan Kang, Minjeong Jeon
Journal:

Psychometrika ,

Published online by Cambridge University Press:

26 February 2025, pp. 1-28
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Conditional dependence (CD) reflects potential interactions between persons and items in measurement, offering valuable information for deriving personalized diagnoses, evaluations, and feedback. The recent integration of psychometric models with latent space provides an effective way to visualize and quantify person–item interactions unexplained by latent variables and item parameters. In such applications, it is important to recognize the relative nature of CD, as models with different structures and complexities (e.g., due to factor dimensionality and item parameters) produce varying systematic explanations of person and item effects, leading to differing residual variations in both quantitative and qualitative sense. To demonstrate this relativity, we extend the previously developed unidimensional Rasch-based latent space item response model by incorporating between-item multidimensionality and item discrimination parameters. The resulting model can be reduced to simpler models with appropriate constraints, allowing us to explore the relativity in CD by comparing them. Simulation studies demonstrate that (1) the most complex proposed model properly recovers its parameters, (2) it outperforms the traditional IRT models by accounting for CD, and (3) the models in comparison exhibit distinctive extents of CD. The study continues with empirical examples that further illustrate relative changes in the extent and configurations of CD with practical implications.

MODGIRT: Multidimensional Dynamic Scaling of Aggregate Survey Data
Elissa Berwick, Devin Caughey
Journal:

Political Analysis / Volume 33 / Issue 2 / April 2025

Published online by Cambridge University Press:

17 January 2025, pp. 91-106
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Dynamic models of aggregate public opinion are increasingly popular, but to date they have been restricted to unidimensional latent traits. This is problematic because in many domains the structure of mass preferences is multidimensional. We address this limitation by deriving a multidimensional ordinal dynamic group-level item response theory (MODGIRT) model. We describe the Bayesian estimation of the model and present a novel workflow for dealing with the difficult problem of identification. With simulations, we show that MODGIRT recovers aggregate parameters without estimating subject-level ideal points and is robust to moderate violations of assumptions. We further validate the model by reproducing at the group level an existing individual-level analysis of British attitudes towards redistribution. We then reanalyze a recent cross-national application of a group-level item response theory model, replacing its domain-specific confirmatory approach with an exploratory MODGIRT model. We describe extensions to allow for overdispersion, differential item functioning, and group-level predictors. A publicly available R package implements these methods.

Adding Regularized Horseshoes to the Dynamics of Latent Variable Models
Garret Binding, Piotr Koc
Journal:

Political Analysis / Volume 33 / Issue 2 / April 2025

Published online by Cambridge University Press:

17 January 2025, pp. 171-177
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Dynamic latent variable models generally link units’ positions on a latent dimension over time via random walks. Theoretically, these trajectories are often expected to resemble a mixture of periods of stability interrupted by moments of change. In these cases, a prior distribution such as the regularized horseshoe—that allows for both stasis and change—can prove a better theoretical and empirical fit for the underlying construct than other priors. Replicating Reuning, Kenwick, and Fariss (2019), we find that the regularized horseshoe performs better than the standard normal and the Student’s t-distribution when modeling dynamic latent variable models. Overall, the use of the regularized horseshoe results in more accurate and precise estimates. More broadly, the regularized horseshoe is a promising prior for many similar applications.

Every Trait Counts: Marginal Maximum Likelihood Estimation for Novel Multidimensional Count Data Item Response Models with Rotation or $\boldsymbol{\ell}_{\mathbf{1}}$–Regularization for Simple Structure
Marie Beisemann, Heinz Holling, Philipp Doebler
Journal:

Psychometrika / Volume 90 / Issue 1 / March 2025

Published online by Cambridge University Press:

03 January 2025, pp. 304-330
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Multidimensional item response theory (MIRT) offers psychometric models for various data settings, most popularly for dichotomous and polytomous data. Less attention has been devoted to count responses. A recent growth in interest in count item response models (CIRM)—perhaps sparked by increased occurrence of psychometric count data, e.g., in the form of process data, clinical symptom frequency, number of ideas or errors in cognitive ability assessment—has focused on unidimensional models. Some recent unidimensional CIRMs rely on the Conway–Maxwell–Poisson distribution as the conditional response distribution which allows conditionally over-, under-, and equidispersed responses. In this article, we generalize to the multidimensional case, introducing the Multidimensional Two-Parameter Conway–Maxwell–Poisson Model (M2PCMPM). Using the expectation-maximization (EM) algorithm, we develop marginal maximum likelihood estimation methods, primarily for exploratory M2PCMPMs. The resulting discrimination matrices are rotationally indeterminate. Recently, regularization of the discrimination matrix has been used to obtain a simple structure (i.e., a sparse solution) for dichotomous and polytomous data. For count data, we also (1) rotate or (2) regularize the discrimination matrix. We develop an EM algorithm with lasso ($\ell _1$) regularization for the M2PCMPM and compare (1) and (2) in a simulation study. We illustrate the proposed model with an empirical example using intelligence test data.

Adjusting for Information Inflation Due to Local Dependency in Moderately Large Item Clusters
Edward Hak-sing IP
Journal:

Psychometrika / Volume 65 / Issue 1 / March 2000

Published online by Cambridge University Press:

02 January 2025, pp. 73-91
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
When multiple items are clustered around a reading passage, the local independence assumption in item response theory is often violated. The amount of information contained in an item cluster is usually overestimated if violation of local independence is ignored and items are treated as locally independent when in fact they are not. In this article we provide a general method that adjusts for the inflation of information associated with a test containing item clusters. A computational scheme was presented for the evaluation of the factor of adjustment for clusters in the restrictive case of two items per cluster, and the general case of more than two items per cluster. The methodology was motivated by a study of the NAEP Reading Assessment. We present a simulated study along with an analysis of a NAEP data set.

A Note on the Identifiability of Fixed-Effect 3PL Models
Hao Wu
Journal:

Psychometrika / Volume 81 / Issue 4 / December 2016

Published online by Cambridge University Press:

01 January 2025, pp. 1093-1097
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
In this note, we prove that the 3 parameter logistic model with fixed-effect abilities is identified only up to a linear transformation of the ability scale under mild regularity conditions, contrary to the claims in Theorem 2 of San Martín et al. (Psychometrika, 80(2):450–467, 2015a).

A New Concurrent Calibration Method for Nonequivalent Group Design under Nonrandom Assignment
Kei Miyazaki, Takahiro Hoshino, Shin-ichi Mayekawa, Kazuo Shigemasu
Journal:

Psychometrika / Volume 74 / Issue 1 / March 2009

Published online by Cambridge University Press:

01 January 2025, pp. 1-19
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This study proposes a new item parameter linking method for the common-item nonequivalent groups design in item response theory (IRT). Previous studies assumed that examinees are randomly assigned to either test form. However, examinees can frequently select their own test forms and tests often differ according to examinees’ abilities. In such cases, concurrent calibration or multiple group IRT modeling without modeling test form selection behavior can yield severely biased results. We proposed a model wherein test form selection behavior depends on test scores and used a Monte Carlo expectation maximization (MCEM) algorithm. This method provided adequate estimates of testing parameters.

On the Bock-Aitkin Procedure—from an EM Algorithm Perspective
Yaowen Hsu
Journal:

Psychometrika / Volume 65 / Issue 4 / December 2000

Published online by Cambridge University Press:

01 January 2025, pp. 547-549
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
The relationship between the EM algorithm and the Bock-Aitkin procedure is described with a continuous distribution of ability (latent trait) from an EM-algorithm perspective. Previous work has been restricted to the discrete case from a probit-analysis perspective.

Correction for Item Response Theory Latent Trait Measurement Error in Linear Mixed Effects Models
Chun Wang, Gongjun Xu, Xue Zhang
Journal:

Psychometrika / Volume 84 / Issue 3 / September 2019

Published online by Cambridge University Press:

01 January 2025, pp. 673-700
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
When latent variables are used as outcomes in regression analysis, a common approach that is used to solve the ignored measurement error issue is to take a multilevel perspective on item response modeling (IRT). Although recent computational advancement allows efficient and accurate estimation of multilevel IRT models, we argue that a two-stage divide-and-conquer strategy still has its unique advantages. Within the two-stage framework, three methods that take into account heteroscedastic measurement errors of the dependent variable in stage II analysis are introduced; they are the closed-form marginal MLE, the expectation maximization algorithm, and the moment estimation method. They are compared to the naïve two-stage estimation and the one-stage MCMC estimation. A simulation study is conducted to compare the five methods in terms of model parameter recovery and their standard error estimation. The pros and cons of each method are also discussed to provide guidelines for practitioners. Finally, a real data example is given to illustrate the applications of various methods using the National Educational Longitudinal Survey data (NELS 88).

Multidimensional Item Response Theory in the Style of Collaborative Filtering
Yoav Bergner, Peter Halpin, Jill-Jênn Vie
Journal:

Psychometrika / Volume 87 / Issue 1 / March 2022

Published online by Cambridge University Press:

01 January 2025, pp. 266-288
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This paper presents a machine learning approach to multidimensional item response theory (MIRT), a class of latent factor models that can be used to model and predict student performance from observed assessment data. Inspired by collaborative filtering, we define a general class of models that includes many MIRT models. We discuss the use of penalized joint maximum likelihood to estimate individual models and cross-validation to select the best performing model. This model evaluation process can be optimized using batching techniques, such that even sparse large-scale data can be analyzed efficiently. We illustrate our approach with simulated and real data, including an example from a massive open online course. The high-dimensional model fit to this large and sparse dataset does not lend itself well to traditional methods of factor interpretation. By analogy to recommender-system applications, we propose an alternative “validation” of the factor model, using auxiliary information about the popularity of items consulted during an open-book examination in the course.

A Multicomponent Latent Trait Model for Diagnosis
Susan E. Embretson, Xiangdong Yang
Journal:

Psychometrika / Volume 78 / Issue 1 / January 2013

Published online by Cambridge University Press:

01 January 2025, pp. 14-36
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This paper presents a noncompensatory latent trait model, the multicomponent latent trait model for diagnosis (MLTM-D), for cognitive diagnosis. In MLTM-D, a hierarchical relationship between components and attributes is specified to be applicable to permit diagnosis at two levels. MLTM-D is a generalization of the multicomponent latent trait model (MLTM; Whitely in Psychometrika, 45:479–494, 1980; Embretson in Psychometrika, 49:175–186, 1984) to be applicable to measures of broad traits, such as achievement tests, in which component structure varies between items. Conditions for model identification are described and marginal maximum likelihood estimators are presented, along with simulation data to demonstrate parameter recovery. To illustrate how MLTM-D can be used for diagnosis, an application to a large-scale test of mathematics achievement is presented. An advantage of MLTM-D for diagnosis is that it may be more applicable to large-scale assessments with more heterogeneous items than are latent class models.

Commentary: Matching IRT Models to PRO Constructs—Modeling Alternatives, and Some Thoughts on What Makes a Model Different
Matthias von Davier
Journal:

Psychometrika / Volume 86 / Issue 3 / September 2021

Published online by Cambridge University Press:

01 January 2025, pp. 825-832
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This commentary is an attempt to present some additional alternatives to the suggestions made by Reise et al. (2021). IRT models as they are used for patient-reported outcome (PRO) scales may not be fully satisfactory when used with commonly made assumptions. The suggested change to an alternative parameterization is critically reflected with the intent to initiate discussion around more comprehensive alternatives that allow for more complex latent structures having the potential to be more appropriate for PRO scales as they are applied to diverse populations.

A Response Model for Multiple Choice Items
David Thissen, Lynne Steinberg
Journal:

Psychometrika / Volume 49 / Issue 4 / December 1984

Published online by Cambridge University Press:

01 January 2025, pp. 501-519
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We introduce an extended multivariate logistic response model for multiple choice items; this model includes several earlier proposals as special cases. The discussion includes a theoretical development of the model, a description of the relationship between the model and data, and a marginal maximum likelihood estimation scheme for the item parameters. Comparisons of the performance of different versions of the full model with more constrained forms corresponding to previous proposals are included, using likelihood ratio statistics and empirical data.

Modeling Rule-Based Item Generation
Hanneke Geerlings, Cees A. W. Glas, Wim J. van der Linden
Journal:

Psychometrika / Volume 76 / Issue 2 / April 2011

Published online by Cambridge University Press:

01 January 2025, pp. 337-359
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
An application of a hierarchical IRT model for items in families generated through the application of different combinations of design rules is discussed. Within the families, the items are assumed to differ only in surface features. The parameters of the model are estimated in a Bayesian framework, using a data-augmented Gibbs sampler. An obvious application of the model is computerized algorithmic item generation. Such algorithms have the potential to increase the cost-effectiveness of item generation as well as the flexibility of item administration. The model is applied to data from a non-verbal intelligence test created using design rules. In addition, results from a simulation study conducted to evaluate parameter recovery are presented.

Simulation-Extrapolation with Latent Heteroskedastic Error Variance
J. R. Lockwood, Daniel F. McCaffrey
Journal:

Psychometrika / Volume 82 / Issue 3 / September 2017

Published online by Cambridge University Press:

01 January 2025, pp. 717-736
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This article considers the application of the simulation-extrapolation (SIMEX) method for measurement error correction when the error variance is a function of the latent variable being measured. Heteroskedasticity of this form arises in educational and psychological applications with ability estimates from item response theory models. We conclude that there is no simple solution for applying SIMEX that generally will yield consistent estimators in this setting. However, we demonstrate that several approximate SIMEX methods can provide useful estimators, leading to recommendations for analysts dealing with this form of error in settings where SIMEX may be the most practical option.

Comparing Item Characteristic Curves
Paul R. Rosenbaum
Journal:

Psychometrika / Volume 52 / Issue 2 / March 1987

Published online by Cambridge University Press:

01 January 2025, pp. 217-233
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Test items are often evaluated and compared by contrasting the shapes of their item characteristics curves (ICC's) or surfaces. The current paper develops and applies three general (i.e., nonparametric) comparisons of the shapes of two item characteristic surfaces: (i) proportional latent odds, (ii) uniform relative difficulty, and (iii) item sensitivity. Two items may be compared in these ways while making no assumption about the shapes of item characteristic surfaces for other items, and no assumption about the dimensionality of the latent variable. Also studied is a method for comparing the relative shapes of two item characteristic curves in two examinee populations.

Measuring Growth in a Longitudinal Large-Scale Assessment with a General Latent Variable Model
Matthias von Davier, Xueli Xu, Claus H. Carstensen
Journal:

Psychometrika / Volume 76 / Issue 2 / April 2011

Published online by Cambridge University Press:

01 January 2025, pp. 318-336
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
The aim of the research presented here is the use of extensions of longitudinal item response theory (IRT) models in the analysis and comparison of group-specific growth in large-scale assessments of educational outcomes.
A general discrete latent variable model was used to specify and compare two types of multidimensional item-response-theory (MIRT) models for longitudinal data: (a) a model that handles repeated measurements as multiple, correlated variables over time and (b) a model that assumes one common variable over time and additional variables that quantify the change. Using extensions of these MIRT models, we approach the issue of modeling and comparing group-specific growth in observed and unobserved subpopulations. The analyses presented in this paper aim at answering the question whether academic growth is homogeneous across types of schools defined by academic demands and curricular differences. In order to facilitate answering this research question, (a) a model with a single two-dimensional ability distribution was compared to (b) a model assuming multiple populations with potentially different two-dimensional ability distributions based on type of school and to (c) a model that assumes that the observations are sampled from a discrete mixture of (unobserved) populations, allowing for differences across schools with respect to mixing proportions. For this purpose, we specified a hierarchical-mixture distribution variant of the two MIRT models. The latter model, (c), is a growth-mixture MIRT model that allows for variation of the mixing proportions across clusters in a hierarchically organized sample. We applied the proposed models to the PISA-I-Plus data for assessing learning and change across multiple subpopulations. The results of this study support the hypothesis of differential growth.

A Nonparametric Approach for Assessing Latent Trait Unidimensionality
William Stout
Journal:

Psychometrika / Volume 52 / Issue 4 / December 1987

Published online by Cambridge University Press:

01 January 2025, pp. 589-617
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Assuming a nonparametric family of item response theory models, a theory-based procedure for testing the hypothesis of unidimensionality of the latent space is proposed. The asymptotic distribution of the test statistic is derived assuming unidimensionality, thereby establishing an asymptotically valid statistical test of the unidimensionality of the latent trait. Based upon a new notion of dimensionality, the test is shown to have asymptotic power 1. A 6300 trial Monte Carlo study using published item parameter estimates of widely used standardized tests indicates conservative adherence to the nominal level of significance and statistical power averaging 81 out of 100 rejections for examinee sample sizes and psychological test lengths often incurred in practice.

The Crosswise Model for Surveys on Sensitive Topics: A General Framework for Item Selection and Statistical Analysis
Marco Gregori, Martijn G. De Jong, Rik Pieters
Journal:

Psychometrika / Volume 89 / Issue 3 / September 2024

Published online by Cambridge University Press:

01 January 2025, pp. 1007-1033
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
When surveys contain direct questions about sensitive topics, participants may not provide their true answers. Indirect question techniques incentivize truthful answers by concealing participants’ responses in various ways. The Crosswise Model aims to do this by pairing a sensitive target item with a non-sensitive baseline item, and only asking participants to indicate whether their responses to the two items are the same or different. Selection of the baseline item is crucial to guarantee participants’ perceived and actual privacy and to enable reliable estimates of the sensitive trait. This research makes the following contributions. First, it describes an integrated methodology to select the baseline item, based on conceptual and statistical considerations. The resulting methodology distinguishes four statistical models. Second, it proposes novel Bayesian estimation methods to implement these models. Third, it shows that the new models introduced here improve efficiency over common applications of the Crosswise Model and may relax the required statistical assumptions. These three contributions facilitate applying the methodology in a variety of settings. An empirical application on attitudes toward LGBT issues shows the potential of the Crosswise Model. An interactive app, Python and MATLAB codes support broader adoption of the model.

Search Results

Refine search

Refine search

Actions for selected content:

260 results

The Hidden Measurement Crisis in Criminology

Multidimensional Latent Space Item Response Models: A Note on the Relativity of Conditional Dependence

MODGIRT: Multidimensional Dynamic Scaling of Aggregate Survey Data

Adding Regularized Horseshoes to the Dynamics of Latent Variable Models

Every Trait Counts: Marginal Maximum Likelihood Estimation for Novel Multidimensional Count Data Item Response Models with Rotation or $\boldsymbol{\ell}_{\mathbf{1}}$–Regularization for Simple Structure

Adjusting for Information Inflation Due to Local Dependency in Moderately Large Item Clusters

A Note on the Identifiability of Fixed-Effect 3PL Models

A New Concurrent Calibration Method for Nonequivalent Group Design under Nonrandom Assignment

On the Bock-Aitkin Procedure—from an EM Algorithm Perspective

Correction for Item Response Theory Latent Trait Measurement Error in Linear Mixed Effects Models

Multidimensional Item Response Theory in the Style of Collaborative Filtering

A Multicomponent Latent Trait Model for Diagnosis

Commentary: Matching IRT Models to PRO Constructs—Modeling Alternatives, and Some Thoughts on What Makes a Model Different

A Response Model for Multiple Choice Items

Modeling Rule-Based Item Generation

Simulation-Extrapolation with Latent Heteroskedastic Error Variance

Comparing Item Characteristic Curves

Measuring Growth in a Longitudinal Large-Scale Assessment with a General Latent Variable Model

A Nonparametric Approach for Assessing Latent Trait Unidimensionality

The Crosswise Model for Surveys on Sensitive Topics: A General Framework for Item Selection and Statistical Analysis

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

260 results

The Hidden Measurement Crisis in Criminology