We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Evidence from response time studies and time pressure experiments has led several authors to conclude that “fairness is intuitive”. In light of conflicting findings, we provide theoretical arguments showing under which conditions an increase in “fairness” due to time pressure indeed provides unambiguous evidence in favor of the “fairness is intuitive” hypothesis. Drawing on recent applications of the Drift Diffusion Model (Krajbich et al. in Nat Commun 6:7455, 2015a), we demonstrate how the subjective difficulty of making a choice affects decisions under time pressure and time delay, thereby making an unambiguous interpretation of time pressure effects contingent on the choice situation. To explore our theoretical considerations and to retest the “fairness is intuitive” hypothesis, we analyze choices in two-person binary dictator and prisoner’s dilemma games under time pressure or time delay. In addition, we manipulate the subjective difficulty of choosing the fair relative to the selfish option. Our main finding is that time pressure does not consistently promote fairness in situations where this would be predicted after accounting for choice difficulty. Hence, our results cast doubt on the hypothesis that “fairness is intuitive”.
I show how using response times as a proxy for effort can address a long-standing issue of how to separate the effect of cognitive ability on performance from the effect of motivation. My method is based on a dynamic stochastic model of optimal effort choice in which ability and motivation are the structural parameters. I show how to estimate these parameters from the data on outcomes and response times in a cognitive task. In a laboratory experiment, I find that performance on a digit-symbol test is a noisy and biased measure of cognitive ability. Ranking subjects by their performance leads to an incorrect ranking by their ability in a substantial number of cases. These results suggest that interpreting performance on a cognitive task as ability may be misleading.
Models of stochastic choice are studied in decision theory, discrete choice econometrics, behavioral economics and psychology. Numerous experiments show that perception of stimuli is not deterministic, but stochastic (randomly determined). A growing body of evidence indicates that the same is true of economic choices. Whether trials are separated by days or minutes, the fraction of choice reversals is substantial. Stochastic Choice Theory offers a systematic introduction to these models, unifying insights from various fields. It explores mathematical models of stochastic choice, which have a variety of applications in game theory, industrial organization, labor economics, marketing, and experimental economics. Offering a systematic introduction to the field, this book builds up from scratch without any prior knowledge requirements and surveys recent developments, bringing readers to the frontier of research.
The recent “Every Student Succeed Act" encourages schools to use an innovative assessment to provide feedback about students’ mastery level of grade-level content standards. Mastery of a skill requires the ability to complete the task with not only accuracy but also fluency. This paper offers a new sight on using both response times and response accuracy to measure fluency with cognitive diagnosis model framework. Defining fluency as the highest level of a categorical latent attribute, a polytomous response accuracy model and two forms of response time models are proposed to infer fluency jointly. A Bayesian estimation approach is developed to calibrate the newly proposed models. These models were applied to analyze data collected from a spatial rotation test. Results demonstrate that compared with the traditional CDM that using response accuracy only, the proposed joint models were able to reveal more information regarding test takers’ spatial skills. A set of simulation studies were conducted to evaluate the accuracy of model estimation algorithm and illustrate the various degrees of model complexities.
Current modeling of response times on test items has been strongly influenced by the paradigm of experimental reaction-time research in psychology. For instance, some of the models have a parameter structure that was chosen to represent a speed-accuracy tradeoff, while others equate speed directly with response time. Also, several response-time models seem to be unclear as to the level of parametrization they represent. A hierarchical framework for modeling speed and accuracy on test items is presented as an alternative to these models. The framework allows a “plug-and-play approach” with alternative choices of models for the response and response-time distributions as well as the distributions of their parameters. Bayesian treatment of the framework with Markov chain Monte Carlo (MCMC) computation facilitates the approach. Use of the framework is illustrated for the choice of a normal-ogive response model, a lognormal model for the response times, and multivariate normal models for their parameters with Gibbs sampling from the joint posterior distribution.
Computerized assessment provides rich multidimensional data including trial-by-trial accuracy and response time (RT) measures. A key question in modeling this type of data is how to incorporate RT data, for example, in aid of ability estimation in item response theory (IRT) models. To address this, we propose a joint model consisting of a two-parameter IRT model for the dichotomous item response data, a log-normal model for the continuous RT data, and a normal model for corresponding paper-and-pencil scores. Then, we reformulate and reparameterize the model to capture the relationship between the model parameters, to facilitate the prior specification, and to make the Bayesian computation more efficient. Further, we propose several new model assessment criteria based on the decomposition of deviance information criterion (DIC) the logarithm of the pseudo-marginal likelihood (LPML). The proposed criteria can quantify the improvement in the fit of one part of the multidimensional data given the other parts. Finally, we have conducted several simulation studies to examine the empirical performance of the proposed model assessment criteria and have illustrated the application of these criteria using a real dataset from a computerized educational assessment program.
Missing values at the end of a test typically are the result of test takers running out of time and can as such be understood by studying test takers’ working speed. As testing moves to computer-based assessment, response times become available allowing to simulatenously model speed and ability. Integrating research on response time modeling with research on modeling missing responses, we propose using response times to model missing values due to time limits. We identify similarities between approaches used to account for not-reached items (Rose et al. in ETS Res Rep Ser 2010:i–53, 2010) and the speed-accuracy (SA) model for joint modeling of effective speed and effective ability as proposed by van der Linden (Psychometrika 72(3):287–308, 2007). In a simulation, we show (a) that the SA model can recover parameters in the presence of missing values due to time limits and (b) that the response time model, using item-level timing information rather than a count of not-reached items, results in person parameter estimates that differ from missing data IRT models applied to not-reached items. We propose using the SA model to model the missing data process and to use both, ability and speed, to describe the performance of test takers. We illustrate the application of the model in an empirical analysis.
Multinomial processing tree models assume that discrete cognitive states determine observed response frequencies. Generalized processing tree (GPT) models extend this conceptual framework to continuous variables such as response times, process-tracing measures, or neurophysiological variables. GPT models assume finite-mixture distributions, with weights determined by a processing tree structure, and continuous components modeled by parameterized distributions such as Gaussians with separate or shared parameters across states. We discuss identifiability, parameter estimation, model testing, a modeling syntax, and the improved precision of GPT estimates. Finally, a GPT version of the feature comparison model of semantic categorization is applied to computer-mouse trajectories.
A lognormal model for response times is used to check response times for aberrances in examinee behavior on computerized adaptive tests. Both classical procedures and Bayesian posterior predictive checks are presented. For a fixed examinee, responses and response times are independent; checks based on response times offer thus information independent of the results of checks on response patterns. Empirical examples of the use of classical and Bayesian checks for detecting two different types of aberrances in response times are presented. The detection rates for the Bayesian checks outperformed those for the classical checks, but at the cost of higher false-alarm rates. A guideline for the choice between the two types of checks is offered.
This paper discusses two forms of separability of item and person parameters in the context of response time (RT) models. The first is “separate sufficiency”: the existence of sufficient statistics for the item (person) parameters that do not depend on the person (item) parameters. The second is “ranking independence”: the likelihood of the item (person) ranking with respect to RTs does not depend on the person (item) parameters. For each form a theorem stating sufficient conditions, is proved. The two forms of separability are shown to include several (special cases of) models from psychometric and biometric literature. Ranking independence imposes no restrictions on the general distribution form, but on its parametrization. An estimation procedure based upon ranks and pseudolikelihood theory is discussed, as well as the relation of ranking independence to the concept of double monotonicity.
Careless and insufficient effort responding (C/IER) can pose a major threat to data quality and, as such, to validity of inferences drawn from questionnaire data. A rich body of methods aiming at its detection has been developed.Most of these methods can detect only specific types of C/IER patterns. However, typically different types of C/IER patterns occur within one data set and need to be accounted for. We present a model-based approach for detecting manifold manifestations of C/IER at once. This is achieved by leveraging response time (RT) information available from computer-administered questionnaires and integrating theoretical considerations on C/IER with recent psychometric modeling approaches. The approach a) takes the specifics of attentive response behavior on questionnaires into account by incorporating the distance–difficulty hypothesis, b) allows for attentiveness to vary on the screen-by-respondent level, c) allows for respondents with different trait and speed levels to differ in their attentiveness, and d) at once deals with various response patterns arising from C/IER. The approach makes use of item-level RTs. An adapted version for aggregated RTs is presented that supports screening for C/IER behavior on the respondent level. Parameter recovery is investigated in a simulation study. The approach is illustrated in an empirical example, comparing different RT measures and contrasting the proposed model-based procedure against indicator-based multiple-hurdle approaches.
In order to identify aberrant response-time patterns on educational and psychological tests, it is important to be able to separate the speed at which the test taker operates from the time the items require. A lognormal model for response times with this feature was used to derive a Bayesian procedure for detecting aberrant response times. Besides, a combination of the response-time model with a regular response model in an hierarchical framework was used in an alternative procedure for the detection of aberrant response times, in which collateral information on the test takers’ speed is derived from their response vectors. The procedures are illustrated using a data set for the Graduate Management Admission Test® (GMAT®). In addition, a power study was conducted using simulated cheating behavior on an adaptive test.
The analysis of variance, and mixed models in general, are popular tools for analyzing experimental data in psychology. Bayesian inference for these models is gaining popularity as it allows to easily handle complex experimental designs and data dependence structures. When working on the log of the response variable, the use of standard priors for the variance parameters can create inferential problems and namely the non-existence of posterior moments of parameters and predictive distributions in the original scale of the data. The use of the generalized inverse Gaussian distributions with a careful choice of the hyper-parameters is proposed as a general purpose option for priors on variance parameters. Theoretical and simulations results motivate the proposal. A software package that implements the analysis is also discussed. As the log-transformation of the response variable is often applied when modelling response times, an empirical data analysis in this field is reported.
Complex interactive test items are becoming more widely used in assessments. Being computer-administered, assessments using interactive items allow logging time-stamped action sequences. These sequences pose a rich source of information that may facilitate investigating how examinees approach an item and arrive at their given response. There is a rich body of research leveraging action sequence data for investigating examinees’ behavior. However, the associated timing data have been considered mainly on the item-level, if at all. Considering timing data on the action-level in addition to action sequences, however, has vast potential to support a more fine-grained assessment of examinees’ behavior. We provide an approach that jointly considers action sequences and action-level times for identifying common response processes. In doing so, we integrate tools from clickstream analyses and graph-modeled data clustering with psychometrics. In our approach, we (a) provide similarity measures that are based on both actions and the associated action-level timing data and (b) subsequently employ cluster edge deletion for identifying homogeneous, interpretable, well-separated groups of action patterns, each describing a common response process. Guidelines on how to apply the approach are provided. The approach and its utility are illustrated on a complex problem-solving item from PIAAC 2012.
Statistical methods for identifying aberrances on psychological and educational tests are pivotal to detect flaws in the design of a test or irregular behavior of test takers. Two approaches have been taken in the past to address the challenge of aberrant behavior detection, which are (1) modeling aberrant behavior via mixture modeling methods, and (2) flagging aberrant behavior via residual based outlier detection methods. In this paper, we propose a two-stage method that is conceived of as a combination of both approaches. In the first stage, a mixture hierarchical model is fitted to the response and response time data to distinguish normal and aberrant behaviors using Markov chain Monte Carlo (MCMC) algorithm. In the second stage, a further distinction between rapid guessing and cheating behavior is made at a person level using a Bayesian residual index. Simulation results show that the two-stage method yields accurate item and person parameter estimates, as well as high true detection rate and low false detection rate, under different manipulated conditions mimicking NAEP parameters. A real data example is given in the end to illustrate the potential application of the proposed method.
Takane and Sergent developed a model (MAXRT) for scaling same/different judgments and response times (RTs) simultaneously. The model assumes that RTs are distributed lognormally. Our experiment showed that the RT distribution of the judgments might be task dependent. It is shown that lognormal RTs provide a far better fit than exponential, normal, and Pareto distributed RTs (with the same means and variances), but that the final parameter estimates from the data set with lognormal RTs hardly differ from the alternatively distributed RTs. Finally, despite the robustness of the distributional assumption of the RTs with respect to the parameter estimates, it is shown that RTs have an informational value that is not contained in the same/different judgments alone.
In this paper we study the statistical relations between three latent trait models for accuracies and response times: the hierarchical model (HM) of van der Linden (Psychometrika 72(3):287–308, 2007), the signed residual time model (SM) proposed by Maris and van der Maas (Psychometrika 77(4):615–633, 2012), and the drift diffusion model (DM) as proposed by Tuerlinckx and De Boeck (Psychometrika 70(4):629–650, 2005). One important distinction between these models is that the HM and the DM either assume or imply that accuracies and response times are independent given the latent trait variables, while the SM does not. In this paper we investigate the impact of this conditional independence property—or a lack thereof—on the manifest probability distribution for accuracies and response times. We will find that the manifest distributions of the latent trait models share several important features, such as the dependency between accuracy and response time, but we also find important differences, such as in what function of response time is being modeled. Our method for characterizing the manifest probability distributions is related to the Dutch identity (Holland in Psychometrika 55(6):5–18, 1990).
Response times on test items are easily collected in modern computerized testing. When collecting both (binary) responses and (continuous) response times on test items, it is possible to measure the accuracy and speed of test takers. To study the relationships between these two constructs, the model is extended with a multivariate multilevel regression structure which allows the incorporation of covariates to explain the variance in speed and accuracy between individuals and groups of test takers. A Bayesian approach with Markov chain Monte Carlo (MCMC) computation enables straightforward estimation of all model parameters. Model-specific implementations of a Bayes factor (BF) and deviance information criterium (DIC) for model selection are proposed which are easily calculated as byproducts of the MCMC computation. Both results from simulation studies and real-data examples are given to illustrate several novel analyses possible with this modeling framework.
Starting from an explicit scoring rule for time limit tasks incorporating both response time and accuracy, and a definite trade-off between speed and accuracy, a response model is derived. Since the scoring rule is interpreted as a sufficient statistic, the model belongs to the exponential family. The various marginal and conditional distributions for response accuracy and response time are derived, and it is shown how the model parameters can be estimated. The model for response accuracy is found to be the two-parameter logistic model. It is found that the time limit determines the item discrimination, and this effect is illustrated with the Amsterdam Chess Test II.
In this paper we propose two interpretations for the discrimination parameter in the two-parameter logistic model (2PLM). The interpretations are based on the relation between the 2PLM and two stochastic models. In the first interpretation, the 2PLM is linked to a diffusion model so that the probability of absorption equals the 2PLM. The discrimination parameter is the distance between the two absorbing boundaries and therefore the amount of information that has to be collected before a response to an item can be given. For the second interpretation, the 2PLM is connected to a specific type of race model. In the race model, the discrimination parameter is inversely related to the dependency of the information used in the decision process. Extended versions of both models with person-to-person variability in the difficulty parameter are considered. When fitted to a data set, it is shown that a generalization of the race model that allows for dependency between choices and response times (RTs) is the best-fitting model.