Hostname: page-component-745bb68f8f-d8cs5 Total loading time: 0 Render date: 2025-01-07T17:39:48.473Z Has data issue: false hasContentIssue false

Bayesian Semiparametric Longitudinal Inverse-Probit Mixed Models for Category Learning

Published online by Cambridge University Press:  27 December 2024

Minerva Mukhopadhyay
Affiliation:
Indian Institute of Technology
Jacie R. McHaney
Affiliation:
Northwestern University
Bharath Chandrasekaran
Affiliation:
Northwestern University
Abhra Sarkar*
Affiliation:
University of Texas at Austin
*
Correspondence should be made to Abhra Sarkar, Department of Statistics and Data Sciences, University of Texas at Austin, 105 East 24th Street D9800, Austin, TX78712, USA. Email: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

Understanding how the adult human brain learns novel categories is an important problem in neuroscience. Drift-diffusion models are popular in such contexts for their ability to mimic the underlying neural mechanisms. One such model for gradual longitudinal learning was recently developed in Paulon et al. (J Am Stat Assoc 116:1114–1127, 2021). In practice, category response accuracies are often the only reliable measure recorded by behavioral scientists to describe human learning. Category response accuracies are, however, often the only reliable measure recorded by behavioral scientists to describe human learning. To our knowledge, however, drift-diffusion models for such scenarios have never been considered in the literature before. To address this gap, in this article, we build carefully on Paulon et al. (J Am Stat Assoc 116:1114–1127, 2021), but now with latent response times integrated out, to derive a novel biologically interpretable class of ‘inverse-probit’ categorical probability models for observed categories alone. However, this new marginal model presents significant identifiability and inferential challenges not encountered originally for the joint model in Paulon et al. (J Am Stat Assoc 116:1114–1127, 2021). We address these new challenges using a novel projection-based approach with a symmetry-preserving identifiability constraint that allows us to work with conjugate priors in an unconstrained space. We adapt the model for group and individual-level inference in longitudinal settings. Building again on the model’s latent variable representation, we design an efficient Markov chain Monte Carlo algorithm for posterior computation. We evaluate the empirical performance of the method through simulation experiments. The practical efficacy of the method is illustrated in applications to longitudinal tone learning studies.

Type
Theory & Methods
Copyright
Copyright © 2024 The Author(s), under exclusive licence to The Psychometric Society

Scientific Background Categorization decisions are important in almost all aspects of our lives—whether it is a friend or a foe, edible or non-edible, the word /bat/ or /hat/, etc. The underlying cognitive dynamics are being actively studied through extensive ongoing research (Glimcher & Fehr, Reference Glimcher and Fehr2013; Gold & Shadlen, Reference Gold and Shadlen2007; Heekeren et al., Reference Heekeren, Marrett, Bandettini and Ungerleider2004; Purcell, Reference Purcell2013; Schall, Reference Schall2001; Smith & Ratcliff, Reference Smith and Ratcliff2004).

In typical multi-category decision tasks, the brain accumulates sensory evidence in order to make a categorical decision. This accumulation process is reflected in the increasing firing rates at local neural populations associated with different decisions. A decision is taken when neural activity in one of these populations reaches a particular threshold level. The decision category that is finally chosen is the one whose decision threshold is crossed first (Brody & Hanks, Reference Brody and Hanks2016; Gold & Shadlen, Reference Gold and Shadlen2007). Changes in evidence accumulation rates and decision thresholds can be induced by differences in task difficulty and/or cognitive function (Cavanagh et al., Reference Cavanagh, Wiecki, Cohen, Figueroa, Samanta, Sherman and Frank2011; Ding & Gold, Reference Ding and Gold2013). Decision-making is also regulated by demands on both the speed and accuracy of the task (Bogacz et al., Reference Bogacz, Wagenmakers, Forstmann and Nieuwenhuis2010; Milosavljevic et al., Reference Milosavljevic, Malmaud, Huth, Koch and Rangel2010).

Understanding the brain activity patterns for different decision alternatives is a key scientific interest in modeling brain mechanisms underlying decision-making. Statistical approaches with biologically interpretable parameters that further allow probabilistic clustering of the parameters (Lau & Green, Reference Lau and Green2007; Wade, Reference Wade2023) associated with different competing choices can facilitate such inference, the parameters clustering together indicating similar behavior and difficulty levels.

Drift-Diffusion Models A biologically interpretable joint model for decision response accuracies and associated response times is obtained by imitating the underlying evidence accumulation mechanisms using latent drift-diffusion processes racing toward their respective boundaries, the process reaching its boundary first producing the final observed decision and the time taken to reach this boundary giving the associated response time (Fig. 1a) (Usher & McClelland, Reference Usher and McClelland2001).

Figure 1 Drift-diffusion model for tone learning. The tones {T1, T2, T3, T4} represent the different categories; s denotes an input category, d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d^{\prime }$$\end{document} the different possible response categories, and d the final response category. Here we are illustrating a single trial with input tone T1 ( s=1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s = 1$$\end{document} ) that was eventually correctly identified ( d=1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d=1$$\end{document} ). a Shows a process whose parameters can be inferred from data on both response categories and response times. Here, after an initial δs \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{s}$$\end{document} amount of time required to encode an input category s (here T1), the evidence in favor of different possible response categories d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d'$$\end{document} accumulates according to latent Wiener diffusion processes Wd,s(τ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W_{d',s}(\tau )$$\end{document} (red, blue, green, and purple) with drifts μd,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{d',s}$$\end{document} . The decision d (here T1) is eventually taken if the underlying process (here the red one) is the first to reach its decision boundary bd,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{d,s}$$\end{document} . b shows a process with additional identifiability restrictions (for all d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d'$$\end{document} and s, δs=0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{s}=0$$\end{document} , bd,s=b \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{d',s}=b$$\end{document} fixed, and d=1d0μd,s=d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum _{d'=1}^{d_{0}}\mu _{d',s}=d_{0}$$\end{document} ) considered in this article which can be inferred from data on response categories alone.

The literature on drift-diffusion processes for decision-making is rather vast but is mostly focused on simple binary decision scenarios with a single latent diffusion process with two boundaries, one for each of the two decision alternatives (Ratcliff, Reference Ratcliff1978; Ratcliff et al., Reference Ratcliff, Smith, Brown and McKoon2016; Ratcliff & Rouder, Reference Ratcliff and Rouder1998; Ratcliff & McKoon, Reference Ratcliff and McKoon2008; Smith & Vickers, Reference Smith and Vickers1988). Multi-category drift-diffusion models with multiple latent processes are mathematically more easily tractable (Brown & Heathcote, Reference Brown and Heathcote2008; Dufau et al., Reference Dufau, Grainger and Ziegler2012; Kim et al., Reference Kim, Potter, Craigmile, Peruggia and Van Zandt2017; Leite & Ratcliff, Reference Leite and Ratcliff2010; Usher & McClelland, Reference Usher and McClelland2001) but the literature is sparse and focused only on simple static designs.

Learning to make categorization decisions is, however, a dynamic process, driven by perceptual adjustments in our brain and behavior over time. Category learning is thus often studied in longitudinal experiments. To address the need for sophisticated statistical methods for such settings, Paulon et al. (Reference Paulon, Llanos, Chandrasekaran and Sarkar2021) developed an inverse Gaussian distribution-based multi-category longitudinal drift-diffusion mixed model.

Data Requirements and Related Challenges Crucially, measurements on both the final decision categories and the associated response times are needed to estimate the drift and the boundary parameters from conventional drift-diffusion models, including the work by Paulon et al. (Reference Paulon, Llanos, Chandrasekaran and Sarkar2021). Unfortunately, however, researchers often only record the participants’ decision responses as their go-to measure of categorization performance, ignoring the response times (Chandrasekaran et al., Reference Chandrasekaran, Yi and Maddox2014; Filoteo et al., Reference Filoteo, Lauritzen and Maddox2010). Additionally, eliciting accurate response times can be methodologically challenging, e.g., in the case of experiments conducted online, especially during the Covid-19 pandemic (Roark et al., Reference Roark, Smayda and Chandrasekaran2021), or when the response times from participants/patients are unreliable due to motor deficits (Ashby et al., Reference Ashby, Noble, Filoteo, Waldron and Ell2003). Participants may also be asked to delay the reporting of their decisions so that delayed physiological responses that relate to decision-making can be accurately measured (McHaney et al., Reference McHaney, Tessmer, Roark and Chandrasekaran2021). In such cases, the reported response times may not accurately relate to the actual decision times and hence cannot be used in the analysis. As a result, conventional drift-diffusion analysis that requires data on both response accuracies and response times, such as Paulon et al. (Reference Paulon, Llanos, Chandrasekaran and Sarkar2021), cannot be used in such scenarios.

The Research Question The main research question addressed in this article is to see if a new class of drift-diffusion models can be designed for such scenarios which will allow the biologically interpretable drift-diffusion process parameters to be meaningfully recovered from data on input–output category combinations alone.

The Inverse-Probit Model Categorical probability models that build on latent drift-diffusion processes can be useful in providing biologically interpretable inference in data sets comprising input–output categories but no response times. To our knowledge, however, the problem has never been considered in the literature before. We aim to address this remarkable gap in this article.

By integrating out the latent response times from the joint inverse Gaussian drift-diffusion model for response categories and associated response times in Paulon et al. (Reference Paulon, Llanos, Chandrasekaran and Sarkar2021), we can arrive at a natural albeit overparametrized model for the response categories. We refer to this as the ‘inverse-probit’ categorical probability model. This inverse-probit model serves as the starting point for the methodology presented in this article but, as we describe below, it also comes with significant and unique statistical challenges not encountered in the original drift-diffusion model.

Statistical Challenges While scientifically desirable, unfortunately, it is also mathematically impossible to infer both the drifts and the boundaries in the inverse-probit model from data only on the decision accuracies. We must thus have to keep the values of either the drifts or the boundaries fixed and focus on inferring the other.

However, even when we fix either the drift or the decision boundaries, the problem of overparametrization persists. In the absence of response times, only the information on relative frequencies, that is empirical probabilities of taking a decision is available. As the total probability of observing any of the competing decisions is one, the identifiability problem remains for the chosen main parameters of interest, and appropriate remedial constraints need to be imposed.

Setting an arbitrarily chosen category as the reference provides a simple solution widely adopted in categorical probability models but comes with serious limitations, including breaking the symmetry of the problem, potentially making posterior inference sensitive to the specific choice of the reference category (Burgette & Nordheim, Reference Burgette and Nordheim2012; Johndrow et al., Reference Johndrow, Dunson and Lum2013).

By breaking the symmetry of the problem, a reference category also additionally makes it difficult to infer the potential clustering of the model parameters, especially across different panels. To see this, consider a problem with d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{0}$$\end{document} categories, with a logistic model for the probabilities ps,d=logistic(βs,d),s,d{1:d0} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{s,d^{\prime }}=\hbox {logistic}(\beta _{s,d^{\prime }}),\; s,d^{\prime }\in \{1:d_{0}\}$$\end{document} , of choosing the d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${d^{\prime }}$$\end{document} th output category for the sth input category. For each input category s, by setting the sth output category as a reference, e.g., by fixing βs,s=0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{s,s} = 0$$\end{document} , one can then cluster the probabilities of incorrect decision choices, ps,d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{s,d^{\prime }}$$\end{document} , ds \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d^{\prime }\ne s$$\end{document} . However, it is not clear how to compare the probabilities across different input categories (i.e., across the four panels in Fig. 2), e.g., how to test the equality of p1,1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{1,1}$$\end{document} and p2,2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{2,2}$$\end{document} .

Finally, while coming up with solutions for the aforementioned issues, we must also take into consideration the complex longitudinal design of the experiments generating the data. Whatever strategy we devise, it should be amenable to a longitudinal mixed model analysis that ideally allows us to (a) estimate the smoothly varying longitudinal trajectories of the parameters as the participants learn over time, (b) accommodate participant heterogeneity, and (c) compare the estimates at different time points within and between different input categories.

Our Proposed Approach As a first step toward addressing the identifiability issues and related modeling challenges, we keep the boundaries fixed but leave the drift parameters unconstrained. The decision to focus on the drifts is informed by the existing literature on such models cited above where the drifts have almost always been allowed more flexibility. The analysis of Paulon et al. (Reference Paulon, Llanos, Chandrasekaran and Sarkar2021) also showed that it is primarily the variations in the drift trajectories that explain learning while the boundaries remain relatively stable over time.

As a next step toward establishing identifiability, we apply a ‘sum to a constant’ condition on the drifts so that symmetry is maintained in the constrained model.

Implementation of this restriction brings in significant challenges. One possibility is to design a prior on the constraint space, a challenging task in itself. Additionally, posterior computation for such priors would also be extremely complicated in drift-diffusion models. Instead, we conduct inference with an unconstrained prior on the drift parameters and project the samples drawn from the corresponding posterior to the constrained space through a minimal distance mapping.

To adapt this categorical probability model to a longitudinal mixed model setting, we then assume that the drift parameters comprise input-response-category-specific fixed effects and subject-specific random effects, modeling them flexibly by mixtures of locally supported B-spline bases (de Boor, Reference de Boor1978; Eilers & Marx, Reference Eilers and Marx1996) spanning the length of the longitudinal experiment. These effects are thus allowed to evolve flexibly as smooth functions of time (Morris, Reference Morris2015; Ramsay & Silverman, Reference Ramsay and Silverman2007; Wang et al., Reference Wang, Chiou and Müller2016) as the participants get more experience and training in their assigned decision tasks.

We take a Bayesian route to estimation and inference. Carefully exploiting conditional prior-posterior conjugacy as well as our latent variable construction, we design an efficient Markov chain Monte Carlo (MCMC)-based algorithm for approximating the posterior, where sampling the latent response times for each observed response category greatly simplifies the computations.

We evaluate the numerical performance of the proposed approach in extensive simulation studies. We then apply our method to the PTC1 data set described below. These applications illustrate the utility of our method in providing insights into how the drift parameters characterize the rates of accumulation of evidence in the brain evolve over time, differ between input–output category combinations, as well as between individuals.

Differences from Previous Works This article differs in many fundamental ways from all existing works on drift-diffusion models, including Paulon et al. (Reference Paulon, Llanos, Chandrasekaran and Sarkar2021), where response categories and response times were both observed and therefore the drift and boundary parameters could be modeled jointly with no identifiability issues. In contrast, the current work is motivated by scenarios where data on only response categories are available, leading us to the inverse-probit categorical probability model which, with its complex identifiability issues, brings in new unique challenges to performing statistical inference, confining us only to infer the drift parameters on a relative scale, achieved via a novel projection-based approach. The introduction and analysis of the inverse-probit model, addressing the significant new statistical challenges posed by it, ranging across (a) identifiability issues, (b) assessment of intra- and inter-panel similarities, (c) extension to complex longitudinal mixed effects settings to accommodate the motivating applications, (d) computational implementation of these new models, etc. are the novel contributions of this article.

Outline of the Article Section 1 describes our motivating tone learning study. Sections 2 and 3 develop our longitudinal inverse-probit mixed model. Section 4 outlines our computational strategies. Section 5 presents the results of simulation experiments. Section 6 presents the results of the proposed method applied to our motivating PTC1 study. Section 7 concludes the main article with a discussion. Additional details, including Markov chain Monte Carlo (MCMC)-based posterior inference algorithms, are deferred to the supplementary material.

1. The PTC1 Data Set

The PTC1 (pupillometry tone categorization experiment 1) data set is obtained from a Mandarin tone learning study conducted at the Department of Communication Science and Disorders, University of Pittsburgh (McHaney et al., Reference McHaney, Tessmer, Roark and Chandrasekaran2021). Mandarin Chinese is a tonal language, which means that pitch patterns at the syllable level differentiate word meanings. There are four linguistically relevant pitch patterns in Mandarin that make up the four Mandarin tones: high-flat (Tone 1), low-rising (Tone 2), low-dipping (Tone 3), and high-falling (Tone 4). For example, the syllable /ma/ can be pronounced using the four different pitch patterns of the four tones, which would result in four different word meanings. Adult native English speakers typically experience difficulty differentiating between the four Mandarin tones because pitch contrasts at the syllable level are not linguistically relevant to word meanings in English (Wang et al., Reference Wang, Spence, Jongman and Sereno1999, Reference Wang, Jongman and Sereno2003). Thus, Mandarin tones are valid stimuli to examine how non-native speech sounds are acquired, which has implications for second language learning in adulthood. In PTC1, a group of native English-speaking younger adults learned to categorize monosyllabic Mandarin tones in a training task. During a single trial of training, an input tone was presented over headphones, and the participants were instructed to categorize the tone into one of the four tone categories via a button press on a keyboard. Corrective feedback in the form of “Correct” or “Wrong” was then provided on screen. A total of n=28 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=28$$\end{document} participants completed the training task across T=6 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T=6$$\end{document} blocks of training, each block comprising L=40 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L=40$$\end{document} trials. Figure 2 shows the middle 30% quantiles of the proportion of times the response to an input tone was classified into different tone categories over blocks across different subjects, each for the four input tones.

Pupillometry measurements were also taken during each trial. It is commonly used as a metric of cognitive effort during listening because increases in pupil diameter are associated with greater usage of cognitive resources (Parthasarathy et al., Reference Parthasarathy, Hancock, Bennett, DeGruttola and Polley2020; Peelle, Reference Peelle2018; Robison & Unsworth, Reference Robison and Unsworth2019; Winn et al., Reference Winn, Wendt, Koelewijn and Kuchinsky2018; Zekveld et al., Reference Zekveld, Kramer and Festen2011). One issue with pupillary responses however is that they unfold slowly over time. In view of that, unlike standard Mandarin tone training tasks, where the participants hear the input tone, press the keyboard response, and are provided feedback all within a few seconds (Chandrasekaran et al., Reference Chandrasekaran, Yi, Smayda and Maddox2016; Llanos et al., Reference Llanos, McHaney, Schuerman, Yi, Leonard and Chandrasekaran2020; Reetzke et al., Reference Reetzke, Xie, Llanos and Chandrasekaran2018; Smayda et al., Reference Smayda, Chandrasekaran and Maddox2015), in the PTC1 experiment, there was an intentional four-second delay from the start of the input tone to the response prompt screen where participants made their category decision via button press. This four-second delay allows the pupil to dilate in response to hearing the tone and begin to return to baseline before the participant makes a motor response to the button press. During this four-second period, participants have likely already made conscious category decisions. As such, the response times that are recorded in the end are not meaningful measures of their actual decision times.

This presents a critical limitation for using these response times for further analysis. Conventional drift-diffusion analysis that requires data on response times, such as the one presented in Paulon et al. (Reference Paulon, Llanos, Chandrasekaran and Sarkar2021), can no longer be directly applied here. The focus of this article is to see if the drift-diffusion parameters can still be meaningfully recovered from input–output tone categories alone in the PTC1 data.

We found drift-diffusion analysis in the absence of reliable data on response times challenging enough to merit its separate treatment presented here. Relating drift-diffusion parameters to measures of cognitive effort such as pupillometry is another challenging problem that we are pursuing separately elsewhere.

Figure 2 Description of PTC1 data: The proportion of times the response to an input tone was classified into different tone categories over blocks across different subjects, each for the four input tones (indicated in the panel headers). The thick line represents the median performance and the shaded region indicates the corresponding middle 30% quantiles across subjects.

2. Inverse-Probit Model

The starting point for the proposed inverse-probit categorical probability model follows straightforwardly by integrating out the (unobserved) response times from the joint model for response categories and associated response times developed in Paulon et al. (Reference Paulon, Llanos, Chandrasekaran and Sarkar2021). The derivation of this original joint model illustrates its latent drift-diffusion process-based underpinnings (Fig. 1a). Later such construction will also be crucial in understanding the diffusion process-based foundations of the marginal categorical probability model modified with identifiability constraints proposed in this article (Fig. 1b). We therefore present the derivation from Paulon et al. (Reference Paulon, Llanos, Chandrasekaran and Sarkar2021) ditto here which also keeps the main paper self-contained.

To begin with, a Wiener diffusion process W(τ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W(\tau )$$\end{document} over domain τ(0,) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau \in (0,\infty )$$\end{document} can be specified as W(τ)=μτ+σB(τ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W(\tau ) = \mu \tau + \sigma B(\tau )$$\end{document} , where B(τ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B(\tau )$$\end{document} is the standard Brownian motion, μ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document} is the drift rate, and σ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma $$\end{document} is the diffusion coefficient (Cox & Miller, Reference Cox and Miller1965; Ross et al., Reference Ross, Kelly, Sullivan, Perry, Mercer, Davis, Washburn, Sager, Boyce and Bristow1996). The process has independent normally distributed increments, i.e., ΔW(τ)={W(τ+Δτ)-W(τ)}Normal(μΔτ,σ2Δτ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta W(\tau ) = \{W(\tau +\Delta \tau ) - W(\tau )\} \sim \hbox {Normal}(\mu \Delta \tau ,\sigma ^{2} \Delta \tau )$$\end{document} , independently from W(τ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W(\tau )$$\end{document} . The first passage time of crossing a threshold b, τ=inf{τ:W(0)=0,W(τ)b} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau = \inf \{\tau ^{\prime }: W(0)=0, W(\tau ^{\prime }) \ge b\}$$\end{document} , is then distributed according to an inverse Gaussian distribution (Chhikara, Reference Chhikara1988; Lu, Reference Lu1995; Whitmore & Seshadri, Reference Whitmore and Seshadri1987) with mean b/μ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b/\mu $$\end{document} and variance bσ2/μ3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b\sigma ^2/\mu ^{3}$$\end{document} .

Given a perceptual stimulus s and a set of decision choices d{1:d0} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d^{\prime }\in \{1:d_{0}\}$$\end{document} , the neurons in the brain accumulate evidence in favor of the different alternatives. Modeling this behavior using latent Wiener processes Wd,s(τ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W_{d^{\prime },s}(\tau )$$\end{document} with unit variances, assuming that a decision d is made when the decision threshold bd,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{d,s}$$\end{document} for the dth option is crossed first, as illustrated in Fig. 1a, a probability model for the time τd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{d}$$\end{document} to reach decision d is obtained as

(1) f(τdδs,μd,s,bd,s)=bd,s2π(τd-δs)-3/2exp-{bd,s-μd,s(τd-δs)}22(τd-δs), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} f(\tau _{d} \mid \delta _{s},\mu _{d,s}, b_{d,s}) = \frac{b_{d,s}}{ \sqrt{2\pi } } (\tau _{d}-\delta _{s})^{-3/2} \exp \left[ - \frac{\{b_{d,s}-\mu _{d,s} (\tau _{d}-\delta _{s})\}^{2}}{2 (\tau _{d}-\delta _{s})} \right] , \end{aligned}$$\end{document}

where μd,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{d,s}$$\end{document} denotes the rate of accumulation of evidence, bd,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{d,s}$$\end{document} the decision boundaries, and δs \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{s}$$\end{document} an offset representing time not directly related to the underlying evidence accumulation processes (e.g., the time required to encode the sth signal before evidence accumulation begins, etc.). We let θd,s=(δs,μd,s,bd,s)T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\theta }_{d^{\prime },s}=(\delta _{s},\mu _{d^{\prime },s},b_{d^{\prime },s})^\textrm{T}$$\end{document} .

Joint model for (d,τ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(d,\tau )$$\end{document} : Since a decision d is reached at response time τ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document} if the corresponding threshold is crossed first, that is when {τ=τd}dd{τd>τd} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\tau = \tau _{d}\} \cap _{d^{\prime } \ne d} \{\tau _{d^{\prime }} > \tau _{d}\}$$\end{document} , we have d=argmind{1:d0}τd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d = \arg \min _{d^{\prime }\in \{1:d_{0}\}} \tau _{d^{\prime }}$$\end{document} . Assuming simultaneous accumulation of evidence for all decision categories, modeled by independent Wiener processes, and termination when the threshold for the observed decision category d is reached, the joint distribution of (d,τ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(d, \tau )$$\end{document} is thus given by

(2) f(d,τs,θ)=g(τθd,s)dd{1-G(τθd,s)}, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} f(d, \tau \mid s,{\theta }) = g(\tau \mid {\theta }_{d,s}) \prod _{d^{\prime } \ne d} \{1 - G(\tau \mid {\theta }_{d^{\prime },s})\}, \end{aligned}$$\end{document}

where, to distinguish from the generic notation f, we now use g(·θ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g(\cdot \mid {\theta })$$\end{document} and G(·θ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G(\cdot \mid {\theta })$$\end{document} to denote, respectively, the probability density function (pdf) and the cumulative distribution function (cdf) of an inverse Gaussian distribution, as defined in (1).

Marginal model for d: When the response times τ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document} are unobserved, the probability of taking decision d given the stimulus s is thus obtained from (2) by integrating out the τ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document} as

(3) P(ds,θ)=δsg(τθd,s)dd1-G(τθd,s)dτ. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} P(d \mid s,{\theta }) = \int _{\delta _{s}}^{\infty } g(\tau \mid {\theta }_{d,s}) \prod _{d^{\prime } \ne d} \left\{ 1 - G(\tau \mid {\theta }_{d^{\prime },s})\right\} d\tau . \end{aligned}$$\end{document}

The construction of model (3) is similar to traditional multinomial probit/logit regression models except that the latent variables are now inverse Gaussian distributed as opposed to being normal or extreme-value distributed, and the observed category is associated with the minimum of the latent variables in contrast to being identified with the maximum of the latent variables. We thus refer to this model as a ‘multinomial inverse-probit model’.

With data on both response categories d and response times τ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document} available, the joint model (2) was used to construct the likelihood function in Paulon et al. (Reference Paulon, Llanos, Chandrasekaran and Sarkar2021). In the absence of data on the response times τ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document} , however, the inverse-probit model in (3) provides the basic building block for constructing the likelihood function for the observed response categories. As mentioned in the Abstract, discussed in the Introduction, and detailed in Sect. 2.1, the marginal inverse-probit model (3) for observed categories brings in many new identifiability issues and inference challenges not originally encountered for the joint model (2) developed in Paulon et al. (Reference Paulon, Llanos, Chandrasekaran and Sarkar2021). Solving these new challenges for the marginal model (3) to infer the underlying drift-diffusion parameters θd,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\theta }_{d^{\prime },s}$$\end{document} , for all d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d^{\prime }$$\end{document} , is the focus of this current article.

2.1. Identifiability Issues and Related Modeling Challenges

To begin with, we note that model (3) in itself cannot be identified from data on only the response categories. The offset parameters can easily be seen to not be identifiable since Pτddτd=P(τd-δ)dτd-δ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P\left( \tau _{d}\le \wedge _{d^{\prime }} \tau _{d^{\prime }}\right) =P\left\{ (\tau _{d}-\delta )\le \wedge _{d^{\prime }} \left( \tau _{d^{\prime }}-\delta \right) \right\} $$\end{document} for any δ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta $$\end{document} , where dτd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\wedge _{d^{\prime }}\tau _{d^{\prime }}$$\end{document} denotes the minimum of τd,d{1:d0} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{d^{\prime }}, d^{\prime }\in \{1:d_{0}\}$$\end{document} . As is also well known in the literature, in categorical probability models, the location and scale of the latent continuous variables are not also separately identifiable. The following lemma establishes these points for the inverse-probit model.

Lemma 1

The offset parameters δs \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{s}$$\end{document} are not identifiable in model (3). The drift and the boundary parameters, respectively μd,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{d^{\prime },s}$$\end{document} and bd,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{d^{\prime },s}$$\end{document} , are also not separately identifiable in model (3).

In the proof of Lemma 1 given in Appendix A, we have specifically shown that Pds,θ=Pds,θ, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ P\left( d\mid s, {\theta }\right) = P\left( d\mid s, {\theta }^{\star } \right) ,$$\end{document} where the drift and boundary parameters in θ=μd,s,bd,s;d=1,,d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\theta }=\left\{ \left( \mu _{d^{\prime },s}, b_{d^{\prime },s}\right) ; d^{\prime }=1,\ldots , d_{0}\right\} $$\end{document} and θ=(μd,s,bd,s);d=1,,d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\theta }^{\star }=\left\{ \Big ( \mu _{d^{\prime },s}^{\star }, b_{d^{\prime },s}^{\star }\Big ); d^{\prime }=1,\ldots , d_{0}\right\} $$\end{document} satisfy μd,s=cμd,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{d^{\prime },s}^{\star }=c \mu _{d^{\prime },s}$$\end{document} and bd,s=c-1bd,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{d^{\prime },s}^{\star }= c^{-1} b_{d^{\prime },s}$$\end{document} for some constant c>0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c>0$$\end{document} . The result follows by noting that the transformation τd,s=c-2τd,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{d^{\prime },s}^{\star } = {c}^{-2}\tau _{d^{\prime },s}$$\end{document} does not change the ordering between the τd,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{d^{\prime },s}$$\end{document} ’s and hence the probabilities of the resulting decisions d=argmind{1:d0}τd=argmind{1:d0}τd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d = \arg \min _{d^{\prime }\in \{1:d_{0}\}} \tau _{d^{\prime }} = \arg \min _{d^{\prime }\in \{1:d_{0}\}} \tau _{d^{\prime }}^{\star }$$\end{document} also remain the same. This has the simple implication that if the rate of accumulation of evidence is faster, then the same decision distribution is obtained if the corresponding boundaries are accordingly closer and conversely.

In fact, given the information on input and output categories alone, if d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{0}$$\end{document} denotes the number of possible decision categories, at most d0-1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{0}-1$$\end{document} parameters are estimable. To see this, consider the probabilities P(ds,θ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P(d^{\prime } \mid s,{\theta })$$\end{document} , d=1,,d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d^{\prime }=1, \ldots , d_{0}$$\end{document} , where θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\theta }$$\end{document} is the m-dimensional vector of parameters, possibly containing drift parameters and decision boundaries. Given the perceptual stimulus s as input, the probabilities satisfy d=1d0P(ds,θ)=1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum _{d^{\prime }=1}^{d_{0}}P(d^{\prime } \mid s,{\theta })=1$$\end{document} . Thus, the function Ps(θ)={P(1s,θ),,P(d0s,θ)}T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P_{s}({\theta })= \{P(1 \mid s, {\theta }), \cdots , P(d_{0}\mid s, {\theta })\}^\textrm{T}$$\end{document} lie on a d0-1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{0}-1$$\end{document} -dimensional simplex, Ps(θ):θΔd0-1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P_{s}({\theta }): {\theta }\rightarrow \Delta ^{d_{0}-1}$$\end{document} , and by the model in (3) the mapping is continuous. Thus, it can be shown by the Invariance of Domain theorem (see, e.g., Deo, Reference Deo2018) that if Ps \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P_{s}$$\end{document} is injective and continuous, then the domain of Ps \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P_{s}$$\end{document} must belong to Rm \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {R}}^{m}$$\end{document} , where md0-1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m\le d_{0}-1$$\end{document} . Thus in order to ensure identifiabililty of Ps(θ);θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left\{ P_{s} ({\theta }); {\theta }\right\} $$\end{document} , we must parametrize the probability vector with at most d0-1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${d_{0}-1}$$\end{document} parameters.

The existing literature on drift-diffusion models discussed in the Introduction has traditionally put more emphasis on modeling the drifts (as their reference in the literature as ‘drift’-diffusion models suggests). Previous research on joint models for response tones and associated response times in Paulon et al. (Reference Paulon, Llanos, Chandrasekaran and Sarkar2021) also suggest that the boundaries remain stable around a value of 2 and it is primarily the changes in the drift rates that explain longitudinal learning. In view of this, we keep the boundaries fixed at the constant b=2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b=2$$\end{document} and treat the drifts to be the free parameters instead. In our simulations and real data applications, it is observed that the estimates of the drift parameters and the associated cluster configurations are not very sensitive to small-to-moderate deviations of b around 2. In our codes implementing our method, available as part of the supplementary materials, we allow the practitioner to choose a value of b as they see fit for their specific application. The latent drift-diffusion process based with these constraints, namely δs=0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{s} = 0$$\end{document} and bd,s=b \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{d^{\prime },s} = b$$\end{document} for all d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d^{\prime }$$\end{document} , is shown in Fig. 1b.

While fixing δs=0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{s}=0$$\end{document} and bd,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{d^{\prime },s}$$\end{document} to some known constant b reduces the size of the parameter space to d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{0}$$\end{document} , to ensure identifiability, we still need at least one more constraint on the drift parameters μ1:d0,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }_{1:d_{0},s}$$\end{document} . In categorical probability models, the identifiabily problem of the location parameter is usually addressed by setting one category as a reference and modeling the probabilities of the others (Agresti, Reference Agresti2018; Albert & Chib, Reference Albert and Chib1993; Borooah, Reference Borooah2002; Chib & Greenberg, Reference Chib and Greenberg1998). However, posterior predictions from Bayesian categorical probability models with asymmetric constraints may be sensitive to the choice of reference category (see Burgette & Nordheim, Reference Burgette and Nordheim2012; Johndrow et al., Reference Johndrow, Dunson and Lum2013). Further, as also discussed in the Introduction, the goal of clustering the drift parameters μ1:d0,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }_{1:d_{0},s}$$\end{document} across s can not be accomplished by this apparently simple solution.

The problem can be addressed by imposing a symmetric constraint on μ1:d0,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }_{1:d_{0},s}$$\end{document} instead. A symmetric identifiability constraint has been previously proposed by Burgette et al. (Reference Burgette, Puelz and Hahn2021) in the context of multinomial probit models, where they considered a sum-to-zero constraint on the latent utilities. To implement the constraint, they introduced a faux base category indicator parameter, which is assigned a discrete uniform prior and then learned via MCMC. Given this faux base category indicator, the other parameters are adjusted so that the sum-to-zero restriction is satisfied. However, the introduction of a base category, even if adaptively chosen, does not facilitate the clustering of μ1:d0,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }_{1:d_{0},s}$$\end{document} within and across the different input categories s.

2.2. Our Proposed Approach

In coming up with solutions for these challenges, we take into consideration the complex design of our motivating tone learning experiments, so that our approach is easily extendable to longitudinal mixed model settings, allowing us to (a) estimate the smoothly varying trajectories of the parameters as the participants learn over time, (b) accommodate the heterogeneity between the participants, and (c) compare between the estimates not just within but also crucially between the different panels.

Similar to the sum-to-zero constraint in the multinomial probit model of Burgette et al. (Reference Burgette, Puelz and Hahn2021), we impose a symmetric sum to a constant constraint on the drift parameters μ1:d0,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }_{1:d_{0},s}$$\end{document} to identify our new class of inverse-probit models, although our implementation is quite different from theirs. To conduct inference, we start with an unconstrained prior, then sample from the corresponding unconstrained posterior, and finally project these samples to the constrained space through a minimal distance mapping. Similar ideas have previously been applied to satisfy natural constraints in other contexts. See, e.g., Dunson and Neelon (Reference Dunson and Neelon2003) and Gunn and Dunson (Reference Gunn and Dunson2005).

This approach is significantly advantageous both from a modeling and a computational perspective. On one hand, the basic building blocks are relatively easily extended to complex longitudinal mixed model settings, on the other, posterior computation is facilitated as this allows the use of conjugate priors for the unconstrained parameters. Projection of the drift parameters onto the same space further makes them directly comparable, allowing clustering within and across the panels. The projected drifts can now be interpreted only on a relative scale but such compromises are not avoidable given the challenges we face.

2.2.1. Minimal Distance Mapping

As the drift parameters are positive, the sum to a constant k constraint leads to constrained space Sk={μ:1Tμ=k,μj>0,j=1,,d0} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{S}_{k}=\{{\mu }: \textbf{1}^{T} {\mu }=k,\;\mu _{j}>0,\;j=1,\ldots ,d_{0}\}$$\end{document} on which μ1:d0,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }_{1:d_{0},s}$$\end{document} should be projected. The space Sk \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{S}_{k}$$\end{document} is semi-closed, and therefore, the projection of any point μ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }$$\end{document} onto Sk \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{S}_{k}$$\end{document} may not exist. As a simple one dimensional example, let x=-1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x=-1$$\end{document} and S=(0,1] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {S}}=(0,1]$$\end{document} , then argminyS|y-x|=0S \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\arg \min _{y\in {\mathcal {S}}}|y-x|=0\notin {\mathcal {S}}$$\end{document} . Further, from a practical perspective, a drift parameter infinitesimally close to zero makes the distribution of the associated response times very flat which is typically not observed in real data. Therefore, we choose a small ε>0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon >0$$\end{document} and project μ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }$$\end{document} onto Sε,k={μ:1Tμ=k,μjε,j=1,,d0}. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \mathcal{S}_{\varepsilon , k}=\{{\mu }: \textbf{1}^{T} {\mu }=k,\;\mu _{j}\ge \varepsilon ,\;j=1,\ldots ,d_{0}\}.$$\end{document} We then define the projection of a point μ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }$$\end{document} onto Sε,k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{S}_{\varepsilon ,k}$$\end{document} through minimal distance mapping as

μ=ProjSε,k(μ):=argminνμ-ν:νSε,k, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\mu }^{\star }=\textrm{Proj}_{ \mathcal{S}_{\varepsilon , k}}({\mu }):=\left\{ \textrm{argmin}_{{\nu }} \Vert {\mu }-{\nu }\Vert : {\nu }\in \mathcal{S}_{\varepsilon ,k} \right\} , \end{aligned}$$\end{document}

where · \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Vert \cdot \Vert $$\end{document} is the Euclidean norm. Note that for appropriate choices of (k,ε) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(k,\varepsilon )$$\end{document} , Sε,k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{S}_{\varepsilon ,k}$$\end{document} is non-empty, closed and convex. Therefore, μ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }^{\star }$$\end{document} exists and is unique by the Hilbert projection theorem (Rudin, Reference Rudin1991). The solution to this projection problem comes from the following result from Beck (Reference Beck2017).

Lemma 2

Let Sε,k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \mathcal{S}_{\varepsilon , k}$$\end{document} be as defined above, and Sε=μ:μjε,j=1,,d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{S}_{\varepsilon }=\left\{ {\mu }: \mu _{j}\ge \varepsilon ,\;j=1,\ldots ,d_{0} \right\} $$\end{document} . Then, ProjSε,k(μ)=ProjSε(μ-u1) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{Proj}_{ \mathcal{S}_{\varepsilon , k}}({\mu })=\textrm{Proj}_{ \mathcal{S}_{\varepsilon }}({\mu }- u^{\star } \textbf{1})$$\end{document} , where u \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u^{\star }$$\end{document} is a solution to the equation 1TProjSε(μ-u1)=k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \textbf{1}^{T} \textrm{Proj}_{ \mathcal{S}_{\varepsilon }}({\mu }- u^{\star } \textbf{1})=k $$\end{document} .

Although the analytical form of the solution is not available, as is evident from the above result, the solution mainly relies on finding a root u \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u^{\star }$$\end{document} of the non-increasing function ϕ(u)=1TProjSε(μ-u1)-k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi (u^{\star })=\textbf{1}^{T} \textrm{Proj}_{ \mathcal{S}_{\varepsilon }}({\mu }- u^{\star } \textbf{1})-k$$\end{document} . We apply an algorithm based on Duchi et al. (Reference Duchi, Shalev-Shwartz, Singer and Chandra2008) to reach the solution. The algorithm is described in “Appendix C”.

2.2.2. Identifiability Restrictions

The projection approach solves the problem of identifiability and maps the probability vector corresponding to an input tone s to the constraint space of μ1:d0,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }_{1:d_{0},s}$$\end{document} , Sε,k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{S}_{\varepsilon ,k}$$\end{document} . The following theorem shows that the mapping from the constrained space of μ1:d0,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }_{1:d_{0},s}$$\end{document} to the probability vector P(μ1:d0,s)={p1(μ1:d0,s),,pd0(μ1:d0,s)}T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{P}}}({\mu }_{1:d_{0},s})=\{p_{1}({\mu }_{1:d_{0},s}), \dots , p_{d_{0}}({\mu }_{1:d_{0},s}) \}^\textrm{T}$$\end{document} is injective. To keep the ideas simple, we consider the domain of the function to be S0,k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{S}_{0,k}$$\end{document} (i.e., ε=0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon =0$$\end{document} ) instead of Sε,k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{S}_{\varepsilon ,k}$$\end{document} although a very similar proof would follow if Sε,k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{S}_{\varepsilon ,k}$$\end{document} were considered.

Theorem 1

Let pd(μ1:d0,s) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{d}({\mu }_{1:d_{0},s})$$\end{document} be the probability of observing the output tone d given the input tone s and the drift parameters μ1:d0,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }_{1:d_{0},s}$$\end{document} , as given in (3), for each d=1:d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d=1:d_{0}$$\end{document} . Suppose μ1:d0,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }_{1:d_{0},s}$$\end{document} lies on the space S0,k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{S}_{0,k}$$\end{document} . Then, the function from S0,k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{S}_{0,k}$$\end{document} to the space of probabilities pd(μ1:d0,s);d=1:d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left\{ p_{d}({\mu }_{1:d_{0},s}); d=1:d_{0}\right\} $$\end{document} is injective.

A proof is presented in “Appendix B”.

2.3.3. Conjugate Priors for the Unconstrained Drifts

From (3), given τ1,,τd0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{1},\ldots ,\tau _{d_{0}}$$\end{document} , such that τdminτ1,,τd0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{d}\le \min \left\{ \tau _{1},\ldots ,\tau _{d_{0}} \right\} $$\end{document} , the posterior full conditional of μ1:d0,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }_{1:d_{0},s}$$\end{document} is proportional to πμ(n)π(μ1:d0,s)×d=1d0gτdμd,s, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{{\mu }}^{(n)} \propto \pi ( {\mu }_{1:d_{0},s} ) \times \prod _{d^{\prime }=1}^{d_{0}} g\left( \tau _{d^{\prime }} ~ \mid \mu _{d^{\prime },s}\right) ,$$\end{document} where π(·) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi (\cdot )$$\end{document} is the prior of μ1:d0,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }_{1:d_{0},s}$$\end{document} . Observe that d=1d0gτdμd,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\prod _{d^{\prime }=1}^{d_{0}} g\left( \tau _{d^{\prime }} ~ \mid \mu _{d^{\prime },s}\right) $$\end{document} is Gaussian in μ1:d0,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }_{1:d_{0},s}$$\end{document} . A Gaussian prior on μ1:d0,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }_{1:d_{0},s}$$\end{document} thus induces a conditional posterior for μ1:d0,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }_{1:d_{0},s}$$\end{document} that is also Gaussian and hence very easy to sample from. Importantly, these benefits also extend naturally to multivariate Gaussian priors for any parameter vector βd,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\beta }_{d^{\prime },s}$$\end{document} that relates to μd,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{d^{\prime },s}$$\end{document} linearly. This will be crucial in allowing us to extend the basic building block to longitudinal functional mixed model settings in Sect. 3 next, where we will be modeling time-varying μd,s(t) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{d^{\prime },s}(t)$$\end{document} as flexible mixtures of B-splines with associated coefficients βd,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\beta }_{d^{\prime },s}$$\end{document} .

2.2.4. Justification as a Proper Bayesian Procedure

Define the constrained conditional posterior distribution, π~μ~(n) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{\pi }}_{{\tilde{{\mu }}}}^{(n)}$$\end{document} , of the drift parameters μ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }$$\end{document} as

π~μ~(n)Bζ=πμ(n)μ:Proj(μ)Bζ,BSε,k, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\tilde{\pi }}_{{\tilde{{\mu }}}}^{(n)} \left( B \mid {\zeta }\right) = \pi _{{\mu }}^{(n)} \left( \left\{ {\mu }: \textrm{Proj}( {\mu })\in B \right\} \mid {\zeta }\right) , \quad B \subseteq \mathcal{S}_{\varepsilon ,k}, \end{aligned}$$\end{document}

where πμ(n) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{{\mu }}^{(n)}$$\end{document} is the unconstrained conditional posterior of μ1:d0,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }_{1:d_{0},s}$$\end{document} , given the other variables ζ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\zeta }$$\end{document} . The analytic form of the constrained conditional posterior is not available.

Sen et al. (Reference Sen, Patra and Dunson2018) established a proper Bayesian justification for the posterior projection approach by showing the existence of a prior π~μ1:d0,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{\pi }}\left( {\mu }_{1:d_{0},s}\right) $$\end{document} on the constrained space Sε,k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{S}_{\varepsilon ,k}$$\end{document} such that the resulting posterior is the same as the projected posterior π~μ~(n) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{\pi }}_{{\tilde{{\mu }}}}^{(n)}$$\end{document} . When Sε,k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{S}_{\varepsilon ,k}$$\end{document} is non-empty, closed, and convex, i.e., the projection operator is measurable, such a prior exists if the unconstrained posterior is absolutely continuous with respect to the unconstrained prior (Sen et al., Reference Sen, Patra and Dunson2018, Corollary 1). As the unconstrained induced prior and posterior of the drift parameters are both Gaussian, this result holds in our case as well.

3. Extension to Longitudinal Mixed Models

In this section we adapt the inverse-probit model discussed in Sect. 2 to complex longitudinal design of our motivating PTC1 data set described in the Introduction. Let si,,t \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{i,\ell ,t}$$\end{document} denote the input tone for the ith individual in the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell $$\end{document} th trial of block t. Likewise, let di,,t \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{i,\ell ,t}$$\end{document} denote, respectively, the output tone selected by the ith individual in the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell $$\end{document} th trial of block t. Setting the offsets at zero, and boundary parameters to a fixed constant b, we now have

(4) P{di,,t=dsi,,t=s,μ1:d0,s(i)(t)}=0g{τμd,s(i)(t)}dd1-G{τμd,s(i)(t)}dτ,whereg{τsi,,t=s,μd,s(i)(t)=μ}=b2πτ3/2exp-{b-μτ}22τ. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}&P\{d_{i,\ell ,t} =d \mid s_{i,\ell ,t}=s, {\mu }_{1:d_{0},s}^{(i)}(t)\} = \int _{0}^{\infty } g\{\tau \mid \mu _{d,s}^{(i)}(t)\} \prod _{d^{\prime } \ne d} \left[ 1 - G\{\tau \mid \mu _{d^{\prime },s}^{(i)}(t)\}\right] d\tau , \nonumber \\&\quad \text{ where }\quad g\{\tau \mid s_{i,\ell ,t}=s, \mu _{d^{\prime },s}^{(i)}(t)=\mu \} = \frac{b}{\sqrt{2\pi } \tau ^{3/2}} \exp \left[ - \frac{\{b-\mu \tau \}^{2}}{2 \tau } \right] . \hspace{1 in} \end{aligned}$$\end{document}

The drift rates μd,s(i)(t) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{d^{\prime },s}^{(i)}(t)$$\end{document} now vary with the blocks t. In addition, we accommodate random effects by allowing μd,s(i)(t) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{d^{\prime },s}^{(i)}(t)$$\end{document} to also depend on the subject index i. We let d={di,,t}i,,t \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{d}}}=\{d_{i,\ell ,t}\}_{i,\ell ,t}$$\end{document} , and d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{0}$$\end{document} be the number of possible decision categories (T1, T2, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ldots $$\end{document} , T d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${d_{0}}$$\end{document} ). The likelihood function thus takes the form

L(ds,θ)=d=1d0s=1d0t=1Ti=1n=1LP{di,,tsi,,t,μ1:d0,s(i)(t)}1{di,,t=d,si,,t=s}. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} L({{\textbf{d}}}\mid {{\textbf{s}}}, {\theta }) = \prod _{d=1}^{d_{0}} \prod _{s=1}^{d_{0}} \prod _{t=1}^{T}\prod _{i=1}^{n} \prod _{\ell =1}^{L} \left[ P\{d_{i,\ell ,t} \mid s_{i,\ell ,t}, {\mu }_{1:d_{0},s}^{(i)}(t)\} \right] ^{1\{d_{i,\ell ,t} = d, s_{i,\ell ,t}=s\}}. \end{aligned}$$\end{document}

We reiterate that in deriving the identifiability conditions and designing their implementation strategy in Sect. 2.2, we had to make sure that they would be applicable to the complex multi-subject longitudinal design of the PTC1 data set. Following those ideas, we model the time-varying mixed effects drift parameters μd,s(i)(t) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{d^{\prime },s}^{(i)}(t)$$\end{document} without any constraints first, then project them to the space satisfying the necessary identifying conditions.

For the unconstrained model, we follow the outline of Paulon et al. (Reference Paulon, Llanos, Chandrasekaran and Sarkar2021) with necessary likelihood adjustments. The details are deferred to Section S.1 of the supplementary material. We present here a general outline.

We decompose μd,s(i)(t)=fd,s(t)+ud,s(i)(t) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{d^{\prime },s}^{(i)}(t) = f_{d^{\prime },s}(t) + u_{d^{\prime },s}^{(i)}(t)$$\end{document} where fd,s(t) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_{d^{\prime },s}(t)$$\end{document} and ud,s(i)(t) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u_{d^{\prime },s}^{(i)}(t)$$\end{document} denote, respectively, fixed and random effects components, which are both modeled using flexible mixtures of B-spline bases. This allows us to cluster the fixed effects for different (d,s) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(d^{\prime },s)$$\end{document} combinations with similar shapes by clustering the corresponding B-spline coefficients.

Given posterior samples of fd,s(t) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_{d^{\prime },s}(t)$$\end{document} and ud,s(i)(t) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u_{d^{\prime },s}^{(i)}(t)$$\end{document} , unconstrained samples of μd,s(i)(t) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{d^{\prime },s}^{(i)}(t)$$\end{document} are obtained. For every input tone s, these unconstrained μ1:d0,s(i)(t) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }_{1:d_{0},s}^{(i)}(t)$$\end{document} ’s are then projected to the space Sε,k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{S}_{\varepsilon ,k}$$\end{document} following the method described in Sect. 2.2.1.

4. Posterior Inference

Posterior inference for our proposed inverse-probit mixed model is carried out using samples drawn from the posterior using MCMC algorithm. The algorithm carefully exploits the conditional independence relationships encoded in the model as well as the latent variable construction of the model.

Inference can be greatly simplified by sampling the passage times τ1:d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{1:d_{0}}$$\end{document} and then conditioning on them. However, it is not possible to generate τ1:d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{1:d_{0}}$$\end{document} sequentially, e.g., by generating the passage time of the d-th decision choice τd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{d}$$\end{document} independently, and that of the other decision choices from a truncated inverse-Gaussian distribution, left truncated at τd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{d}$$\end{document} .Footnote 1

We implement a simple accept-reject sampler instead which generates values from the joint distribution of τ1:d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{1:d_{0}}$$\end{document} and accepts the sample if τdτ1:d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{d}\le \tau _{1:d_{0}}$$\end{document} . It is fast and produces a sample from the desired target conditional distribution. We formalize this result in the following lemma.

Lemma 3

Let gτ1:d0μ1:d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g\left( \tau _{1:d_{0}}\mid \mu _{1:d_{0}} \right) $$\end{document} be the joint distribution of τ1:d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{1:d_{0}}$$\end{document} . Consider the following accept-reject algorithm:

Algorithm 1 Generating the passage times τ1:d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{1:d_{0}}$$\end{document} given argmind{1:d0}τd=d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\arg \min _{d^{\prime }\in \{1:d_{0}\}}\tau _{d^{\prime }}=d$$\end{document}

Algorithm 1 generates samples from the conditional joint distribution of τ1:d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{1:d_{0}}$$\end{document} , conditioned on the event τdτ1:d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{d}\le \tau _{1:d_{0}}$$\end{document} .

Proof of Lemma 3 is provided in “Appendix D”.

It can be verified that the acceptance ratio of Algorithm 1 is M-1=Pτdτ1:d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M^{-1}=P\left( \tau _{d} \le \tau _{1:d_{0}}\right) $$\end{document} (see Robert & Casella, Reference Robert and Casella2004) which depends on the drift parameters only. If the drift parameters are ordered accordingly, so as to satisfy μdμ1:d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{d} \ge \mu _{1:d_{0}}$$\end{document} , the acceptance ratios increase. The algorithm thus becomes faster as the sampler converges.

As noted earlier, sampling the latent inverse-gaussian distributed response times τ1:d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{1:d_{0}}$$\end{document} greatly simplifies computation. Most of the chosen priors, including the priors on the coefficients β \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\beta }$$\end{document} in the fixed and random effects, are conjugate. Due to space constraints, the details are deferred to Section S.3 in the supplementary material.

5. Simulation Studies

In this section, we discuss the results of a synthetic numerical experiment. We simulate data from a complex longitudinal design that mimics the real PTC1 data set. Our generating model contains fixed effects components attributed to different input-response tone combinations and random components attributed to individuals.

We recall that our main objective here is to identify the similarities and differences between the underlying brain mechanisms associated with different input-response category combinations over time while also assessing their individual heterogeneity, as characterized by latent drift-diffusion processes whose parameters can be biologically interpreted. The estimation of the probability curves for different input-response combinations, while a good indicator of our model’s fit, is not the main purpose of this endeavor. Traditional categorical probability models, such as multinomial probit or logit, are thus not relevant to the scientific problem we are trying to address here. We are also not aware of any other work in the drift-diffusion literature that attempts to estimate the underlying parameters from category response data alone. In view of this, we restrict our focus to evaluating the performance of the proposed biologically meaningful longitudinal inverse-probit mixed model but do not present comparisons with any other model.

Design In designing the simulation scenario, we have tried to mimic our motivating category learning data sets. We chose n=20 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=20$$\end{document} as the number of participants being trained over T=10 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T=10$$\end{document} blocks to identify d0=4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{0}=4$$\end{document} tones. For each input tone and each block, there are L=40 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L=40$$\end{document} trials. We set the true μd,s(t) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{d^{\prime },s}(t)$$\end{document} values in such a way that they are far from satisfying the constraint d=1:d0μd,s=k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum _{d^{\prime } =1:d_{0}}\mu _{d^{\prime },s}=k$$\end{document} , and the decision boundary is set to b=2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b=2$$\end{document} for all (d,s) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(d^{\prime },s)$$\end{document} . The true drift parameters and the true probabilities, averaged over the participants of each input-response category combination, are shown in Fig. 3.

There are four true clusters in total, two for correct categorizations, S1,S2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{1}, S_{2}$$\end{document} , and two for incorrect categorizations, M1,M2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_{1}, M_{2}$$\end{document} , as follows: S1={(1,1),(2,2)} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{1}=\{(1,1),(2,2)\}$$\end{document} , S2={(3,3),(4,4)} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{2}=\{(3,3),(4,4)\}$$\end{document} , M1={(1,2),(1,3),(2,1),(2,3),(3,4),(4,3)} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_{1}=\{(1,2),(1,3),(2,1),(2,3),(3,4),(4,3)\}$$\end{document} , M2={(1,4),(2,4),(3,1),(3,2),(4,1),(4,2)} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_{2}=\{(1,4),(2,4),(3,1),(3,2),(4,1),(4,2)\}$$\end{document} . We may interpret M1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_{1}$$\end{document} as the cluster of difficult alternatives, and M2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_{2}$$\end{document} as the cluster of easy alternatives. Thus, there are similarities in overall trajectories of {T1,T2} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{T_{1},T_{2}\}$$\end{document} and {T3,T4} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{T_{3},T_{4}\}$$\end{document} , differentiating between easy and hard category recognition problems. We experimented with 50 synthetic data sets generated according to this design.

Figure 3 Description of the synthetic data: True values of the drift parameters averaged over the subjects, denoted by μd,s(t) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{d^{\prime },s}(t)$$\end{document} , and true probabilities P{di,,tsi,,t,μ1:d0,s(i)(t)} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P\{d_{i,\ell ,t} \mid s_{i,\ell ,t}, {\mu }_{1:d_{0},s}^{(i)}(t)\}$$\end{document} averaged over the subjects, denoted here by Pd,s(t) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P_{d^{\prime },s}(t)$$\end{document} . Here T1, T2, T3, and T4 represent input categories 1 to 4, respectively. Some of the curves overlap according to the true clustering structure described in Sect. 5.

Figure 4 Results for synthetic data: Posterior trajectories of the probabilities for each combination of (d,s) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(d^{\prime },s)$$\end{document} over blocks estimated by the proposed model. The shaded areas represent the corresponding 95% point-wise credible intervals. The thick dashed lines represent underlying true curves some of which overlap according to the true clustering structure described in Sect. 5.

Results As the true drift parameters themselves do not satisfy the constraint, and the estimated drift parameters are on the constrained space, we cannot validate our method by its predictive performance of the drift parameters. Instead, the proposed method is validated in terms of the estimated probabilities.

Figure 4 shows the estimated posterior probability trajectories along with the 95% credible interval and the underlying true probability curves for every combination (d,s) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(d^{\prime },s)$$\end{document} in a typical scenario. The credible interval fails to capture the truth in two situations, when the true probability is very close to zero, or it is very close to one. The former case corresponds to classes with very low success probability, resulting in very few observations to estimate. The latter is underestimated as a consequence of the former since the probabilities add up to one.

The results produced by our method are mostly stable and consistent across all synthetic data sets. There are, however, a few cases of incorrect cluster assignments, resulting in some outliers in each boxplot. Note that if an incorrect cluster assignment takes place, the probabilities of all input-response combinations are affected by that. For example, if a component of M1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ M_{1}$$\end{document} is wrongly assigned to M2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_{2}$$\end{document} , then not only the probabilities of input–output combinations in M1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_{1}$$\end{document} and M2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_{2}$$\end{document} are affected, since the probabilities add up to one, those of S1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{1}$$\end{document} and S2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{2}$$\end{document} are also affected.

In estimating the probabilities, the overall mean squared error, i.e., the mean squared difference of the estimated and the true probabilities taking all combinations of (dsit) into account, came out to be 0.0028. Figure 5 provides a detailed description of the estimation of the probabilities for two input categories (one from each similarity group). As described for the individual simulation results, there are cases of under-estimation of the probabilities which are close to one, and consequently, over-estimation of the probabilities close to zero. However, the amount of departure from the true probability in each case is very small which can also be seen in the small overall MSE.

Figure 5 Results of the synthetic data: Boxplots of the estimated probabilities over 50 simulations, and true probabilities (in red dot) of each block and for two panels, one from each similarity group (panel T1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{1}$$\end{document} in the top and T3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{3}$$\end{document} in the bottom).

Further, the overall efficiency in identifying the true clustering structure is validated using Rand (Rand, Reference Rand1971) and adjusted Rand (Hubert & Arabie, Reference Hubert and Arabie1985) indices. The definitions of Rand and adjusted Rand indices are provided in Section S.6 in the supplementary material. The average Rand and adjusted Rand indices for our proposed method over 50 simulations 0.9105 and 0.8277, respectively, indicating high overall efficacy in correctly clustering the probability curves.

6. Applications

Analysis of the PTC1 Data Set We present here the analysis of the PTC1 data set described in Sect. 1 using our proposed longitudinal inverse-probit mixed model. We first demonstrate the performance of the proposed method in estimating the probabilities associated with different (ds) pairs. Figure 6 shows the 95% credible intervals for the estimated probabilities for different input tones, along with the average proportions of times an input tone was classified into different tone categories across subjects. The latter serves as the empirical estimate of the probabilities.

Figure 6 Results for PTC1 data: Estimated probability trajectories compared with average proportions of times an input tone was classified into different tone categories across subjects (in dashed line). The means across subjects are indicated by thick lines and the shaded regions indicate corresponding 95% coverage regions.

We observe that except for the input-response combination (1, 1) in block 3 and some cases with a low number of data points, the 95% credible intervals include the corresponding empirical probabilities. An explanation of the occasional under-performance is given later in this section.

Next, we examine the clusters identified by the proposed model. Apart from the two clusters obtained for the success combinations (d=s) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(d=s)$$\end{document} , three clusters are additionally identified in the incorrect input-response combinations (ds) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(d\ne s)$$\end{document} . The clusters of success combinations are S1={(1,1),(2,2),(4,4)} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{1}=\{(1,1),(2,2),(4,4)\}$$\end{document} and S2={(3,3)} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{2}=\{(3,3)\}$$\end{document} , and of wrong allocations are M1={(1,2),(1,4),(2,1),(2,4),(3,2),(4,1),(4,2)} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_{1}=\{(1,2),(1,4),(2,1),(2,4),(3,2),(4,1),(4,2)\}$$\end{document} , M2={(1,3),(2,3),(3,4),(4,3)} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_{2}=\{(1,3),(2,3),(3,4),(4,3) \}$$\end{document} , and M3={(3,1)} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_{3}=\{(3,1)\}$$\end{document} . Figure 7 shows the input-response tone combinations color-coded as per cluster identity, and the proportion of times each pair of input-response tone combinations appeared in the same cluster after burnin. Figure 7 indicates that, while the clusters S1,S2,M1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{1}, S_{2}, M_{1}$$\end{document} are stable, there is some instability among the other two clusters, namely M2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_{2}$$\end{document} and M3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_{3}$$\end{document} .

Figure 7 Results for PTC1 data: Network plot of similarity groups showing the intra- and inter-cluster similarities of tone recognition problems. Each node is associated with a pair indicating the input-response tone category, (sd). The number associated with each edge indicates the proportion of times the pair in the two connecting nodes appeared in the same cluster after burnin.

Key Findings The clustering structure reveals that the low-dipping ( T3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{3}$$\end{document} ) response trajectories are different from the other three response categories. While for correct input–output tone combinations, S2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{2}$$\end{document} forms a separate singleton cluster, for incorrect combinations, M2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_{2}$$\end{document} contains all the low-dipping trajectories, indicating their similarities across the panels. Also for T3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{3}$$\end{document} , faster increase of the probabilities of correct identification, as well as faster decay of probabilities of incorrect identification indicate that T3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{3}$$\end{document} is easily distinguishable from other alternatives.

On the other hand, the trajectories of high-flat ( T1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{1}$$\end{document} ), low-rising ( T2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{2}$$\end{document} ) and high-falling ( T4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{4}$$\end{document} ) response categories are quite similar across panels. While for correct input-response combinations, these three form the cluster S1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{1}$$\end{document} , the corresponding incorrect tone combinations are clustered in M1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_{1}$$\end{document} . The slower rise of the observed empirical probabilities for the elements in S1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{1}$$\end{document} and the slower decay of the same for M1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_{1}$$\end{document} indicate that T1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{1}$$\end{document} , T2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{2}$$\end{document} and T4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{4}$$\end{document} are difficult to distinguish. However, in block 3 the empirical probabilities of correct input-response combinations differ moderately. While T2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{2}$$\end{document} and T4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{4}$$\end{document} show a relative drop in the empirical probabilities at block 3, T1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{1}$$\end{document} shows a sudden pick in the same. This local dissimilarity of the trajectories at block 3, leads to a departure of the empirical probability of T1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{1}$$\end{document} from the estimated credible band.

Next, we consider the results concerning the estimation of the drift parameters μd,s(i)(t) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{d^{\prime },s}^{(i)}(t)$$\end{document} . As discussed in Sect. 2.2, given the identifiability constraints, the estimates of μd,s(i)(t) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{d^{\prime },s}^{(i)}(t)$$\end{document} can only be interpreted on a relative scale. Figure 8 shows the posterior mean trajectories and associated 95% credible intervals for the projected drift rates.

Figure 8 Results for PTC1 data: Estimated posterior mean trajectories of the population level drifts μd,s(t) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{d^{\prime },s}(t)$$\end{document} for the proposed model. The shaded areas represent the corresponding 95% point-wise credible intervals.

Importantly, our proposed mixed model also allows us to assess individual-specific parameter trajectories. Figure 9 shows the posterior mean trajectories and the associated 95% credible intervals for the drift rates μd,s(i) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{d^{\prime },s}^{(i)}$$\end{document} estimated by our method for the different success combinations (d,s) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(d^{\prime },s)$$\end{document} for two participants—one with the best accuracy averaged across all blocks, and the other with the worst accuracy averaged across all blocks. These results suggest significant individual-specific heterogeneity. For the well-performing participant, the drift parameters are much higher than those for the poorly performing individual, indicating their ability to more quickly accumulate evidence compared to the poorly performing adult. These differences persisted over all blocks with a small gradual increase over time.

Figure 9 Results for PTC1 data: Estimated posterior mean trajectories for individual specific drifts μd,s(i)(t)=exp{fd,s(t)+uC(i)(t)} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{d^{\prime },s}^{(i)}(t) = \exp \{ f_{d^{\prime },s}(t) + u_{C}^{(i)}(t) \}$$\end{document} for successful identification (d=s) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(d^{\prime }=s)$$\end{document} for two different participants—one performing well (dashed line) and one performing poorly (dotted line). The shaded areas represent the corresponding 95% point-wise credible intervals.

Analysis of Benchmark Data To validate the proposed method, we also analyzed tone learning data which, in addition to response accuracies, included accurate measurements of the response times. It was previously analyzed in Paulon et al. (Reference Paulon, Llanos, Chandrasekaran and Sarkar2021) using the drift-diffusion model (2) which allowed inference on both the drift and the boundary parameters. For our analysis with the method proposed here, however, we ignored the response times. We observed that the estimates of the drifts produced by our proposed methodology match well with the estimates obtained by Paulon et al. (Reference Paulon, Llanos, Chandrasekaran and Sarkar2021). A description of this ‘benchmark’ data set and other details of our analyses are provided in Section S.5 of the supplementary material.

7. Discussion, Conclusion, Broader Utility, and Future Work

Summary In this article, we developed a novel longitudinal inverse-probit mixed categorical probability model. Our research was motivated by category learning experiments where scientists are interested in using drift-diffusion models to understand how the decision-making mechanisms evolve as the participants get more training and experience. However, unlike traditional drift-diffusion analyses which require data on both response categories and response times, we only had usable records of response categories but no response times. To our knowledge, biologically interpretable latent drift-diffusion process-based categorical probability models had never been considered for such scenarios in the literature before. We addressed this need. Building on a previous work on longitudinal drift-diffusion mixed joint models for response categories and response times but now integrating out the response times, we obtained a new class of category probability models which we referred to here as the inverse-probit model. We explored parameter recoverability in such models, showing, in particular, that the offset parameters can not be recovered and drifts and boundaries both can not be recovered from data only on response categories. In our analyses, we thus focused on estimating the biologically more important drift parameters but kept the offsets and the boundaries fixed. We showed that with careful domain knowledge informed choices for the boundaries, the general trajectories of the drift parameters can be recovered by our proposed approach even in the complete absence of response times.

Conclusion Overall, when it comes to making scientific inferences about drift-diffusion model parameters in the absence of data on response times, our work implies a mixed promise. On the downside, our work shows that the detailed interplay between drifts and boundaries cannot be captured. On the positive side, our results also suggest that, with our carefully designed model, and the fixed value of the boundary parameters appropriately chosen by experts, the general longitudinal trends in the drifts can still be estimated well from data only on response categories. Caution should still be exercised not to over-interpret the results.

Broader Utility in Auditory Neuroscience The proposed model, we believe, has significant implications for auditory neuroscience. We focused here specifically on a pupillometry study for which the experimental paradigms need to be adapted to prioritize slow pupillary response, rendering the behavioral response times useless. However, as discussed in the Introduction, there could be many other situations where usable data on response times may not be available. The proposed model can be useful in such scenarios to understand the perceptual mechanisms underlying auditory decision-making.

Broader Utility Beyond Auditory Neuroscience While we focused here on studying auditory category learning, the method proposed is applicable to other domains of behavioral neuroscience research studying categorical decision-making when the response times measurements are either not available or not reliable.

Broader Utility in Statistics On the statistical side, the projection-based approach proposed here to impose non-standard identifiability conditions and address clustering problems within and between different panels is not restricted to inverse-probit models introduced here. They can be easily adapted to other classes of generalized linear models such as the widely popular logit and probit models and hence may also be of interest to a much broader statistical audience.

Future Directions The models and the analyses of the PTC1 data set presented here excluded the pupillometry measurements themselves. An important and challenging problem being pursued separately elsewhere is to see how those measurements relate to drift-diffusion model parameters.

Funding

This research was funded by the National Science Foundation grant DMS 1953712 and National Institute on Deafness and Other Communication Disorders Grants R01DC013315 and R01DC015504 awarded to Sarkar and Chandrasekaran.

Appendix

Appendix A: Proof of Lemma 1

Proof. It is easy to check that the offset parameters δs \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{s}$$\end{document} are not identifiable since

P(ds,δs,μ1:d0,s,b1:d0,s)=δsg(τδs,μd,s,bd,s)dd1-G(τδs,μd,s,bd,s)dτ=0g(τ0,μd,s,bd,s)dd1-G(τ0,μd,s,bd,s)dτ=P(ds,0,μ1:d0,s,b1:d0,s). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} P(d \mid s,\delta _{s},{\mu }_{1:d_{0},s}, {{\textbf{b}}}_{1:d_{0},s})= & {} \int _{\delta _{s}}^{\infty } g(\tau \mid \delta _{s},\mu _{d,s},b_{d,s}) \prod _{d^{\prime } \ne d} \left\{ 1 - G(\tau \mid \delta _{s},\mu _{d^{\prime },s},b_{d^{\prime },s})\right\} d\tau \\= & {} \int _{0}^{\infty } g(\tau \mid 0,\mu _{d,s},b_{d,s}) \prod _{d^{\prime } \ne d} \left\{ 1 - G(\tau \mid 0,\mu _{d^{\prime },s},b_{d^{\prime },s})\right\} d\tau \\= & {} P(d \mid s, 0,{\mu }_{1:d_{0},s},{{\textbf{b}}}_{1:d_{0},s}). \end{aligned}$$\end{document}

Next we will show that the drift parameters and decision boundaries are not separately identifiable, even if we fix offset parameters to a constant.

First note that Eq. (3) can also be represented as

(A.1) δsδsddg(τdθd,s)δsddτdg(τdθd,s)dτddddτd. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \int _{\delta _{s}}^{\infty } \ldots \int _{\delta _{s}}^{\infty } \prod _{d^{\prime }\ne d} g(\tau _{d^{\prime }} \mid {\theta }_{d^{\prime },s}) \int _{\delta _{s}}^{\wedge _{d\ne d^{\prime }}\tau _{d^{\prime }} } g(\tau _{d} \mid {\theta }_{d,s}) d\tau _{d} \prod _{d^{\prime }\ne d} d\tau _{d^{\prime }}. \end{aligned}$$\end{document}

First observe that τ=ddτd=τ-1τ1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau ^{\star }=\wedge _{d^{\prime }\ne d}\tau _{d^{\prime }} =\tau _{-1}^{\star }\wedge \tau _{1}$$\end{document} , where τ-1=d{1,d}τd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{-1}^{\star }=\wedge _{d^{\prime }\ne \{1,d\}}\tau _{d^{\prime }}$$\end{document} . Thus the integral above can be written as

δsδsd{1,d}g(τdθd,s)δsg(τ1θ1,s)δsτ-1τ1g(τdθd,s)dτdd{1,d}dτd=δsδsd{1,d}g(τdθd,s)δsτ-1g(τdθd,s)τdg(τ1θ1,s)dτ1dτdd{1,d}dτd. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}&\int _{\delta _{s}}^{\infty } \ldots \int _{\delta _{s}}^{\infty } \prod _{d^{\prime }\ne \{1,d\}} g(\tau _{d^{\prime }} \mid {\theta }_{d^{\prime },s}) \left\{ \int _{\delta _{s}}^{\infty } g(\tau _{1} \mid {\theta }_{1,s}) \int _{\delta _{s}}^{\tau _{-1}^{\star } \wedge \tau _{1} } g(\tau _{d} \mid {\theta }_{d,s}) d\tau _{d}\right\} \prod _{d^{\prime }\ne \{1,d\}} d\tau _{d^{\prime }}\\&\quad =\int _{\delta _{s}}^{\infty } \ldots \int _{\delta _{s}}^{\infty } \prod _{d^{\prime }\ne \{1,d\}} g(\tau _{d^{\prime }} \mid {\theta }_{d^{\prime },s}) \left\{ \int _{\delta _{s}}^{\tau _{-1}^{\star }} g(\tau _{d} \mid {\theta }_{d,s}) \int _{\tau _{d}}^{\infty } g(\tau _{1} \mid {\theta }_{1,s}) d\tau _{1} d\tau _{d} \right\} \prod _{d^{\prime }\ne \{1,d\}} d\tau _{d^{\prime }}. \end{aligned}$$\end{document}

Proceeding sequentially one can show that the integral above is the same as in (3).

Using the above we express the probability in (3) as in (A.1). As the offset parameter δs \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{s}$$\end{document} is already shown to be not identifiable, we need to fix the same. Without loss of generality, we fix the offset parameter at 0. The probability density function of inverse Gaussian distribution, with parameters θd,s=(μd,s,bd,s) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\theta }_{d^{\prime },s}=(\mu _{d^{\prime },s},b_{d^{\prime },s})$$\end{document} evaluated at τd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{d^{\prime }}$$\end{document} , g(τdθd,s) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g(\tau _{d^{\prime }}\mid {\theta }_{d^{\prime },s})$$\end{document} can be obtained from (1) by replacing δs=0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{s}=0$$\end{document} and d=d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d=d^{\prime }$$\end{document} .

Consider the transformation of τd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{d^{\prime }}$$\end{document} to τd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{d^{\prime }}^{\star }$$\end{document} as τd=c2τd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{d^{\prime }}=c^2\tau _{d^{\prime }}^{\star }$$\end{document} , for some constant c>0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c>0$$\end{document} , and for all d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d^{\prime }$$\end{document} . Further, define bd,s=bd,s/c \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{d^{\prime },s}^{\star }=b_{d^{\prime },s}/c$$\end{document} and μd,s=cμd,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{d^{\prime },s}^{\star }=c\mu _{d^{\prime },s}$$\end{document} , for all d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d^{\prime }$$\end{document} . Then observe that

g(τdθd,s)dτd=(2π)-1/2bd,s(τd)-3/2exp-(2τd)-1bd,s-μd,sτj2dτd=g(τdθd,s)dτd, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} g(\tau _{d^{\prime }}\mid {\theta }_{d^{\prime },s}) d\tau _{d^{\prime }}&= (2\pi )^{-1/2} b_{d^{\prime },s}^{\star } (\tau _{d^{\prime }}^{\star })^{-3/2}\exp \left\{ -(2\tau _{d^{\prime }}^{\star })^{-1} \left( b_{d^{\prime },s}^{\star } -\mu _{d^{\prime },s}^{\star } \tau _{j}^{\star } \right) ^{2} \right\} d\tau _{d^{\prime }}^{\star }\\&=g(\tau _{d^{\prime }}^{\star }\mid {\theta }_{d^{\prime },s}^{\star } ) d\tau _{d^{\prime }}^{\star }, \end{aligned}$$\end{document}

where g(τdθd,s) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g(\tau _{d^{\prime }}^{\star }\mid {\theta }_{d^{\prime },s}^{\star } )$$\end{document} is the pdf of inverse Gaussian distribution with parameters μd,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{d^{\prime },s}^{\star }$$\end{document} and bd,s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{d^{\prime },s}^{\star }$$\end{document} , evaluated at the point τd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{d^{\prime }}^{\star }$$\end{document} .

Applying the transformation on τd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{d^{\prime }}$$\end{document} for all d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d^{\prime }$$\end{document} we get that the integral in (A.1) with δs=0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{s}=0$$\end{document} is same as

00ddg(τdθd,s)0ddτdg(τdθd,s)dτddddτd. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \int _{0}^{\infty } \ldots \int _{0}^{\infty } \prod _{d^{\prime }\ne d} g(\tau _{d^{\prime }}^{\star } \mid {\theta }_{d^{\prime },s}^{\star }) \int _{0}^{\wedge _{d^{\prime }\ne d}\tau _{d^{\prime }}^{\star } } g(\tau _{d}^{\star } \mid {\theta }_{d,s}^{\star }) d\tau _{d}^{\star } \prod _{d^{\prime }\ne d} d\tau _{d^{\prime }}^{\star }. \end{aligned}$$\end{document}

As c is arbitrary, this shows that the drifts and boundaries are not separately estimable. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}

Appendix B: Proof of Theorem 1

Proof. Let P(μ1:d0,s)={p1(μ1:d0,s),,pd0(μ1:d0,s)}T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{P}}}({\mu }_{1:d_{0},s})=\{p_{1}({\mu }_{1:d_{0},s}), \dots , p_{d_{0}}({\mu }_{1:d_{0},s}) \}^\textrm{T}$$\end{document} be the function, given by (4), from S0,k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{S}_{0,k}$$\end{document} to unit probability simplex Δd0-1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta ^{d_{0}-1}$$\end{document} . For notational simplicity, we write μ1:d0,s=μ=(μ1,,μd0)T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }_{1:d_{0},s} = {\mu }= (\mu _{1}, \dots , \mu _{d_{0}})^\textrm{T}$$\end{document} . We first find the matrix of partial derivative P \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nabla {{\textbf{P}}}$$\end{document} with respect to μ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }$$\end{document} .

For μS0,k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }\in \mathcal{S}_{0,k}$$\end{document} , 1Tμ=k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \textbf{1}^{T} {\mu }=k$$\end{document} , and hence the probability reduces to

pdμ=bebd0(2π)d0/20τdτd|τ|-3/2exp-121Tτ-11+μTτμdτ-ddτd, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p_{d}\left( {\mu }\right) = \frac{\left( be^{b}\right) ^{d_{0}}}{ (2 \pi )^{d_{0}/2}}\int _{0}^{\infty } \int _{\tau _{d}}^{\infty } \cdots \int _{\tau _{d}}^{\infty } |{\tau }|^{-3/2} \exp \left\{ -\frac{1}{2} \left( \textbf{1}^{T} {\tau }^{-1} \textbf{1} + {\mu }^{T} {\tau }{{\mu }} \right) \right\} d{\tau }_{-d} d\tau _{d}, \end{aligned}$$\end{document}

for d=1,,d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d=1,\ldots , d_{0}$$\end{document} , where τ=diag(τ1,,τd0) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tau }=\textrm{diag}(\tau _{1}, \ldots , \tau _{d_{0}})$$\end{document} , and τ-d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tau }_{-d}$$\end{document} is the sub-vector of τ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tau }$$\end{document} excluding the d-th element. Next, differentiating pdμ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{d}\left( {\mu }\right) $$\end{document} with respect to μ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }$$\end{document} , we get

pdμμ=bebd0(2π)d0/20τdτd|τ|-3/2-τμexp-121Tτ-11+μTτμdτ-ddτd,=μ1η2μd-1η2μdη1μd+1η2μd0η2T, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{\partial {p_{d}\left( {\mu }\right) }}{\partial {\mu }}= & {} \displaystyle \frac{\left( be^{b}\right) ^{d_{0}}}{ (2 \pi )^{d_{0}/2}}\int _{0}^{\infty } \int _{\tau _{d}}^{\infty } \cdots \int _{\tau _{d}}^{\infty } |{\tau }|^{-3/2} \left( -{\tau }{\mu }\right) \exp \left\{ -\frac{1}{2} \left( \textbf{1}^{T} {\tau }^{-1} \textbf{1} + {\mu }^{T} {\tau }{{\mu }} \right) \right\} d{\tau }_{-d} d\tau _{d}, \\= & {} \begin{bmatrix} \mu _{1} \eta _{2}&\quad \cdots&\quad \mu _{d-1} \eta _{2}&\quad \mu _{d} \eta _{1}&\quad \mu _{d+1} \eta _{2}&\quad \cdots&\quad \mu _{d_{0}} \eta _{2} \end{bmatrix}^{T}, \end{aligned}$$\end{document}

where η1=-Eτ1Iτ2>τ1,,τd0>τ1μ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \eta _{1} = - E\left\{ \tau _{1} {\mathbb {I}}\left( \tau _{2}> \tau _{1}, \ldots , \tau _{d_{0}}>\tau _{1}\right) \left| {\mu }\right. \right\} $$\end{document} , and η2=-Eτ2Iτ2>τ1,,τd0>τ1μ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \eta _{2} = -E\left\{ \tau _{2} {\mathbb {I}}\left( \tau _{2}> \tau _{1}, \ldots , \tau _{d_{0}}>\tau _{1}\right) \left| {\mu }\right. \right\} $$\end{document} , and I(A) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {I}}(A)$$\end{document} is the indicator function of the event A. Here the expectation is considered under the joint distribution of τ1,,τd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left( \tau _{1}, \ldots , \tau _{d}\right) $$\end{document} , which is independent inverse Gaussian. Clearly η1>η2>0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta _{1}>\eta _{2}>0$$\end{document} .

From the above derivation, it is easy to obtain that

Pμ=μ1η1μ2η2μd0η2μ1η2μ2η1μd0η2μ1η2μ2η2μd0η1=Mη1-η2I+η211T, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \nabla {{\textbf{P}}}\left( {\mu }\right) = \begin{bmatrix} \mu _{1} \eta _{1} &{}\quad \mu _{2} \eta _{2} &{}\quad \cdots &{}\quad \mu _{d_{0}} \eta _{2} \\ \mu _{1} \eta _{2} &{}\quad \mu _{2} \eta _{1} &{}\quad \cdots &{}\quad \mu _{d_{0}} \eta _{2} \\ \vdots &{}\quad \vdots &{}\quad \cdots &{}\quad \vdots \\ \mu _{1} \eta _{2} &{}\quad \mu _{2} \eta _{2} &{}\quad \cdots &{}\quad \mu _{d_{0}} \eta _{1} \end{bmatrix} = {{\textbf{M}}}\left\{ \left( \eta _{1}-\eta _{2}\right) I+\eta _{2} \textbf{1} \textbf{1}^{T} \right\} , \end{aligned}$$\end{document}

where M=diagμ1,,μd0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{M}}}=\textrm{diag}\left( \mu _{1}, \ldots , \mu _{d_{0}}\right) $$\end{document} .

Now, suppose there exists μ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }$$\end{document} and ν \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\nu }$$\end{document} in Sk \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{S}_{k}$$\end{document} such that μν \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }\ne {\nu }$$\end{document} and Pμ=Pν \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{P}}}\left( {\mu }\right) = {{\textbf{P}}}\left( {\nu }\right) $$\end{document} . Define γ:[0,1]Rd0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\gamma }: [0,1] \rightarrow {\mathbb {R}}^{d_{0}}$$\end{document} such that γ(t)=μ+tν-μ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\gamma }(t)= {\mu }+ t \left( {\nu }- {\mu }\right) $$\end{document} , t[0,1] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t\in [0,1]$$\end{document} . Further, define h(t)=Pγ(t)-Pμ,ν-μ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h(t)= \langle {{\textbf{P}}}\left( {\gamma }(t) \right) - {{\textbf{P}}}\left( {\mu }\right) , {\nu }- {\mu }\rangle $$\end{document} , as the cross-product of Pγ(t)-Pμ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{P}}}\left( {\gamma }(t) \right) - {{\textbf{P}}}\left( {\mu }\right) $$\end{document} and ν-μ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\nu }-{\mu }$$\end{document} . Then h(1)=h(0)=0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h(1)=h(0)=0$$\end{document} under the proposition that Pμ=Pν \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf{P}}}\left( {\mu }\right) = {{\textbf{P}}}\left( {\nu }\right) $$\end{document} . Therefore, by the Mean Value Theorem, as μν \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }\ne {\nu }$$\end{document} , there exists some point c(0,1) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c\in (0,1)$$\end{document} such that h(t)/tt=c=0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left. \partial h(t) / \partial t \right| _{t=c} =0$$\end{document} . Now,

h(t)t=d=1d0νd-μdtpdγ(t)-pdμ=d=1d0νd-μdγpdγTγ(t)t=ν-μTP{γ(t)}ν-μ=η1-η2ν-μTΓ(t)ν-μ+η2ν-μTM11Tν-μ=η1-η2ν-μTΓ(t)ν-μ, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{\partial h(t) }{ \partial t}= & {} \sum _{d^{\prime }=1}^{d_{0}} \left( \nu _{d^{\prime }} - \mu _{d^{\prime }} \right) \frac{\partial }{\partial t}\left[ p_{d^{\prime }} \left\{ {\gamma }(t) \right\} - p_{d^{\prime }} \left( {\mu }\right) \right] \\= & {} \sum _{d^{\prime }=1}^{d_{0}} \left( \nu _{d^{\prime }} - \mu _{d^{\prime }} \right) \left\{ \frac{\partial }{\partial {\gamma }} p_{d^{\prime }} \left( {\gamma }\right) \right\} ^{T} \frac{\partial {\gamma }(t)}{\partial t}\\= & {} \left( {\nu }-{\mu }\right) ^{T} \nabla {{\textbf{P}}}\{{\gamma }(t)\} \left( {\nu }-{\mu }\right) \\= & {} \left( \eta _{1}-\eta _{2}\right) \left( {\nu }-{\mu }\right) ^{T} {\Gamma }(t) \left( {\nu }-{\mu }\right) + \eta _{2} \left( {\nu }-{\mu }\right) ^{T} M \textbf{1} \textbf{1}^{T} \left( {\nu }-{\mu }\right) \\= & {} \left( \eta _{1}-\eta _{2}\right) \left( {\nu }-{\mu }\right) ^{T} {\Gamma }(t) \left( {\nu }-{\mu }\right) , \end{aligned}$$\end{document}

as 1Tν-μ=0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{1}^{T} \left( {\nu }-{\mu }\right) =0$$\end{document} , where Γ(t)=diag{γ(t)} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\Gamma }(t)=\textrm{diag}\{ {\gamma }(t) \}$$\end{document} .

As every component of μ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }$$\end{document} and ν \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\nu }$$\end{document} is positive, for any c(0,1) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c\in (0,1)$$\end{document} , the matrix Γ(c) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\Gamma }(c)$$\end{document} is positive definite. Further, as η1>η2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta _{1}>\eta _{2}$$\end{document} , h(t)/tt=c=0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left. \partial h(t) / \partial t \right| _{t=c} =0$$\end{document} only if μ=ν \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\mu }={\nu }$$\end{document} , which contradicts the proposition. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}

Appendix C: Algorithm for Minimal Distance Mapping

The problem of finding projection of a point μ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }$$\end{document} onto the space Sk,ε \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{S}_{k,\varepsilon }$$\end{document} is equivalent to the following nonlinear optimization problem:

minimizeww-μ2suchthati=1d0wi=k,wiε. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \textrm{minimize}_{{{\textbf{w}}}} \Vert {{{\textbf{w}}}} -{\mu }\Vert ^2 \quad \text{ such } \text{ that } \quad \sum _{i=1}^{d_{0}} w_{i}=k,\quad w_{i} \ge \varepsilon . \end{aligned}$$\end{document}

Duchi et al. (Reference Duchi, Shalev-Shwartz, Singer and Chandra2008, Algorithm 1) provides a solution to the problem of projection of a given point μ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mu }$$\end{document} onto the space Sk,ε \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{S}_{k,\varepsilon }$$\end{document} for ε=0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon =0$$\end{document} , which is modified for any given ε \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document} below.

Appendix D: Proof of Lemma 3

Proof. We consider the unconditional distribution of τ1:d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{1:d_{0}}$$\end{document} , given the parameters μ1:d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{1:d_{0}}$$\end{document} as the proposal distribution, g. Clearly, the proposal distribution g and the target conditional joint distribution f satisfies f(τ1:d0|μ1:d0)/g(τ1:d0|μ1:d0)M \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(\tau _{1:d_{0}}|\mu _{1:d_{0}})/g(\tau _{1:d_{0}}|\mu _{1:d_{0}})\le M$$\end{document} , where M-1=Pτdτ1:d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M^{-1}=P\left( \tau _{d} \le \tau _{1:d_{0}}\right) $$\end{document} . Therefore, for any random sample UU(0,1) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$U\sim U(0,1)$$\end{document} , f(τ1:d0|μ1:d0)MUg(τ1:d0|μ1:d0) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(\tau _{1:d_{0}}|\mu _{1:d_{0}})\ge M U g(\tau _{1:d_{0}}|\mu _{1:d_{0}})$$\end{document} if the sample satisfies the condition τdτ1:d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{d} \le \tau _{1:d_{0}}$$\end{document} , and f(τ1:d0|μ1:d0)<MUg(τ1:d0|μ1:d0) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(\tau _{1:d_{0}}|\mu _{1:d_{0}})< M U g(\tau _{1:d_{0}}|\mu _{1:d_{0}})$$\end{document} otherwise. Hence, by Lemma 2.3.1 of Robert and Casella (Reference Robert and Casella2004), algorithm above produces samples from the target distribution. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}

Footnotes

Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/s11336-024-09947-8.

1 We can see this in a simpler example. Suppose we are interested in generating a sample from the conditional distribution of τ=(τ1,τ2) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tau }=(\tau _{1},\tau _{2})$$\end{document} given d=argminjτj=1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d=\arg \min _{j} \tau _{j}=1$$\end{document} , where τi∼Uniform(0,1) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{i} \sim \texttt{Uniform}(0,1)$$\end{document} , i=1,2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i=1,2$$\end{document} , independently. The conditional density of τ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\tau }$$\end{document} given d=1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d=1$$\end{document} is fτ∣d(τ1,τ2)=0.5 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_{\varvec{\tau }\mid d} (\tau _{1},\tau _{2})= 0.5$$\end{document} if 0<τ1≤τ2<1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0<\tau _{1}\le \tau _{2}<1$$\end{document} , and =0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$=0$$\end{document} otherwise. However, if we draw τ1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{1}$$\end{document} from Uniform(0,1) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\texttt{Uniform}(0,1)$$\end{document} first and let that realization be τ⋆ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau ^{\star }$$\end{document} , and draw τ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{2}$$\end{document} from the truncated uniform distribution (left truncated at τ⋆ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau ^{\star }$$\end{document} ), then the pdf of the realization of (τ1,τ2) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\tau _{1},\tau _{2})$$\end{document} is τ⋆-1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau ^{\star -1}$$\end{document} .

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

References

Agresti, A.. (2018). An introduction to categorical data analysis, Wiley.Google Scholar
Albert, J. H., Chib, S.. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American statistical Association, 88, 669679.CrossRefGoogle Scholar
Ashby, F. G., Noble, S., Filoteo, J. V., Waldron, E. M., Ell, S. W.. (2003). Category learning deficits in Parkinson’s disease. Neuropsychology, 17, 115.CrossRefGoogle ScholarPubMed
Beck, A.. (2017). First-order methods in optimization, SIAM.CrossRefGoogle Scholar
Bogacz, R., Wagenmakers, E.-J., Forstmann, B. U., Nieuwenhuis, S.. (2010). The neural basis of the speed-accuracy tradeoff. Trends in Neurosciences, 33, 1016.CrossRefGoogle ScholarPubMed
Borooah, V. K.. (2002). Logit and probit: Ordered and multinomial models, Sage.CrossRefGoogle Scholar
Brody, C. D., Hanks, T. D.. (2016). Neural underpinnings of the evidence accumulator. Current Opinion in Neurobiology, 37, 149157.CrossRefGoogle ScholarPubMed
Brown, S. D., Heathcote, A.. (2008). The simplest complete model of choice response time: Linear ballistic accumulation. Cognitive Psychology, 57, 153178.CrossRefGoogle ScholarPubMed
Burgette, L. F., Nordheim, E. V.. (2012). The trace restriction: An alternative identification strategy for the Bayesian multinomial probit model. Journal of Business & Economic Statistics, 30, 404410.CrossRefGoogle Scholar
Burgette, L. F., Puelz, D., Hahn, P. R.. (2021). A symmetric prior for multinomial probit models. Bayesian Analysis, 16, 9911008.CrossRefGoogle Scholar
Cavanagh, J. F., Wiecki, T. V., Cohen, M. X., Figueroa, C. M., Samanta, J., Sherman, S. J., Frank, M. J.. (2011). Subthalamic nucleus stimulation reverses mediofrontal influence over decision threshold. Nature Neuroscience, 14, 1462.CrossRefGoogle ScholarPubMed
Chandrasekaran, B., Yi, H.-G., Maddox, W. T.. (2014). Dual-learning systems during speech category learning. Psychonomic Bulletin & Review, 21, 488495.CrossRefGoogle ScholarPubMed
Chandrasekaran, B., Yi, H.-G., Smayda, K. E., Maddox, W. T.. (2016). Effect of explicit dimensional instruction on speech category learning. Attention, Perception, & Psychophysics, 78, 566582.CrossRefGoogle ScholarPubMed
Chhikara, R.. (1988). The inverse Gaussian distribution: Theory, methodology, and applications, CRC Press.Google Scholar
Chib, S., Greenberg, E.. (1998). Analysis of multivariate probit models. Biometrika, 85, 347361.CrossRefGoogle Scholar
Cox, D. R., Miller, H. D.. (1965). The theory of stochastic processes, CRC Press.Google Scholar
de Boor, C.. (1978). A practical guide to splines, Springer.CrossRefGoogle Scholar
Deo, S. (2018). Algebraic topology. Texts and Readings in Mathematics (Vol. 27). Hindustan Book Agency.CrossRefGoogle Scholar
Ding, L., Gold, J. I.. (2013). The basal ganglia’s contributions to perceptual decision making. Neuron, 79, 640649.CrossRefGoogle ScholarPubMed
Duchi, J., Shalev-Shwartz, S., Singer, Y., & Chandra, T. (2008). Efficient projections onto the l 1-ball for learning in high dimensions. In Proceedings of the 25th international conference on machine learning (pp. 272–279).Google Scholar
Dufau, S., Grainger, J., Ziegler, J. C.. (2012). How to say “no” to a nonword: A leaky competing accumulator model of lexical decision. Journal of Experimental Psychology: Learning, Memory, and Cognition, 38, 1117.Google Scholar
Dunson, D. B., Neelon, B.. (2003). Bayesian inference on order-constrained parameters in generalized linear models. Biometrics, 59, 286295.CrossRefGoogle ScholarPubMed
Eilers, P. H., Marx, B. D.. (1996). Flexible smoothing with b-splines and penalties. Statistical Science, 11, 89102.CrossRefGoogle Scholar
Filoteo, J. V., Lauritzen, S., Maddox, W. T.. (2010). Removing the frontal lobes: The effects of engaging executive functions on perceptual category learning. Psychological Science, 21, 415423.CrossRefGoogle ScholarPubMed
Glimcher, P. W., Fehr, E.. (2013). Neuroeconomics: Decision making and the brain, Academic Press.Google Scholar
Gold, J. I., Shadlen, M. N.. (2007). The neural basis of decision making. Annual Review of Neuroscience, 30, 535574.CrossRefGoogle ScholarPubMed
Gunn, L. H., Dunson, D. B.. (2005). A transformation approach for incorporating monotone or unimodal constraints. Biostatistics, 6, 434449.CrossRefGoogle ScholarPubMed
Heekeren, H. R., Marrett, S., Bandettini, P. A., Ungerleider, L. G.. (2004). A general mechanism for perceptual decision-making in the human brain. Nature, 431, 859.CrossRefGoogle ScholarPubMed
Hubert, L., Arabie, P.. (1985). Comparing partitions. Journal of Classification, 2, 193218.CrossRefGoogle Scholar
Johndrow, J., Dunson, D., & Lum, K. (2013). Diagonal orthant multinomial probit models. In Artificial intelligence and statistics (pp. 29–38).Google Scholar
Kim, S., Potter, K., Craigmile, P. F., Peruggia, M., Van Zandt, T.. (2017). A Bayesian race model for recognition memory. Journal of the American Statistical Association, 112, 7791.CrossRefGoogle Scholar
Lau, J. W., Green, P. J.. (2007). Bayesian model-based clustering procedures. Journal of Computational and Graphical Statistics, 16, 526558.CrossRefGoogle Scholar
Leite, F. P., Ratcliff, R.. (2010). Modeling reaction time and accuracy of multiple-alternative decisions. Attention, Perception, & Psychophysics, 72, 246273.CrossRefGoogle ScholarPubMed
Llanos, F., McHaney, J. R., Schuerman, W. L., Yi, H. G., Leonard, M. K., Chandrasekaran, B.. (2020). Non-invasive peripheral nerve stimulation selectively enhances speech category learning in adults. NPJ Science of Learning, 1, 111.Google Scholar
Lu, J. (1995). Degradation processes and related reliability models. PhD thesis, McGill University, Montreal, Canada.Google Scholar
McHaney, J. R., Tessmer, R., Roark, C. L., Chandrasekaran, B.. (2021). Working memory relates to individual differences in speech category learning: Insights from computational modeling and pupillometry. Brain and Language, 22, 115.Google Scholar
Milosavljevic, M., Malmaud, J., Huth, A., Koch, C., Rangel, A.. (2010). The drift diffusion model can account for the accuracy and reaction time of value-based choices under high and low time pressure. Judgment and Decision Making, 5, 437449.CrossRefGoogle Scholar
Morris, J. S.. (2015). Functional regression. Annual Review of Statistics and its Application, 2, 321359.CrossRefGoogle Scholar
Parthasarathy, A., Hancock, K. E., Bennett, K., DeGruttola, V., Polley, D. B.. (2020). Bottom-up and top-down neural signatures of disordered multi-talker speech perception in adults with normal hearing. Elife, 9.CrossRefGoogle ScholarPubMed
Paulon, G., Llanos, F., Chandrasekaran, B., Sarkar, A.. (2021). Bayesian semiparametric longitudinal drift-diffusion mixed models for tone learning in adults. Journal of the American Statistical Association, 116, 11141127.CrossRefGoogle ScholarPubMed
Peelle, J. E.. (2018). Listening effort: How the cognitive consequences of acoustic challenge are reflected in brain and behavior. Ear and Hearing, 39, 204214.CrossRefGoogle ScholarPubMed
Purcell, B. A.. (2013). Neural mechanisms of perceptual decision making, Vanderbilt University.Google Scholar
Ramsay, J. O., Silverman, B. W.. (2007). Applied functional data analysis: Methods and case studies, Springer.Google Scholar
Rand, W. M.. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, 846850.CrossRefGoogle Scholar
Ratcliff, R.. (1978). A theory of memory retrieval. Psychological Review, 85, 59.CrossRefGoogle Scholar
Ratcliff, R., McKoon, G.. (2008). The diffusion decision model: Theory and data for two-choice decision tasks. Neural Computation, 20, 873922.CrossRefGoogle ScholarPubMed
Ratcliff, R., Rouder, J. N.. (1998). Modeling response times for two-choice decisions. Psychological Science, 9, 347356.CrossRefGoogle Scholar
Ratcliff, R., Smith, P. L., Brown, S. D., McKoon, G.. (2016). Diffusion decision model: Current issues and history. Trends in Cognitive Sciences, 20, 260281.CrossRefGoogle ScholarPubMed
Reetzke, R., Xie, Z., Llanos, F., Chandrasekaran, B.. (2018). Tracing the trajectory of sensory plasticity across different stages of speech learning in adulthood. Current Biology, 28, 14191427.CrossRefGoogle ScholarPubMed
Roark, C. L., Smayda, K. E., & Chandrasekaran, B. (2021). Auditory and visual category learning in musicians and nonmusicians. Journal of Experimental Psychology: General.Google Scholar
Robert, C. P., & Casella, G. (2004). Monte Carlo statistical methods. Springer Texts in Statistics (2nd ed.). Springer.CrossRefGoogle Scholar
Robison, M. K., Unsworth, N.. (2019). Pupillometry tracks fluctuations in working memory performance. Attention, Perception, & Psychophysics, 81, 407419.CrossRefGoogle ScholarPubMed
Ross, S. M., Kelly, J. J., Sullivan, R. J., Perry, W. J., Mercer, D., Davis, R. M., Washburn, T. D., Sager, E. V., Boyce, J. B., Bristow, V. L.. (1996). Stochastic processes, Wiley.Google Scholar
Rudin, W. (1991). Functional analysis. International Series in Pure and Applied Mathematics (2nd ed.). McGraw-Hill Inc.Google Scholar
Schall, J. D.. (2001). Neural basis of deciding, choosing and acting. Nature Reviews Neuroscience, 2, 33.CrossRefGoogle ScholarPubMed
Sen, D., Patra, S., & Dunson, D. (2018). Constrained inference through posterior projections. arXiv preprint arXiv:1812.05741.Google Scholar
Smayda, K. E., Chandrasekaran, B., & Maddox, W. T. (2015). Enhanced cognitive and perceptual processing: A computational basis for the musician advantage in speech learning. Frontiers in Psychology, 1–14.CrossRefGoogle Scholar
Smith, P. L., Ratcliff, R.. (2004). Psychology and neurobiology of simple decisions. Trends in Neurosciences, 27, 161168.CrossRefGoogle ScholarPubMed
Smith, P. L., Vickers, D.. (1988). The accumulator model of two-choice discrimination. Journal of Mathematical Psychology, 32, 135168.CrossRefGoogle Scholar
Usher, M., McClelland, J. L.. (2001). The time course of perceptual choice: The leaky, competing accumulator model. Psychological Review, 108, 550.CrossRefGoogle ScholarPubMed
Wade, S.. (2023). Bayesian cluster analysis. Philosophical Transactions of the Royal Society A, 381, 120.Google ScholarPubMed
Wang, J.-L., Chiou, J.-M., Müller, H.-G.. (2016). Functional data analysis. Annual Review of Statistics and its Application, 3, 257295.CrossRefGoogle Scholar
Wang, Y., Jongman, A., Sereno, J. A.. (2003). Acoustic and perceptual evaluation of Mandarin tone productions before and after perceptual training. The Journal of the Acoustical Society of America, 113, 10331043.CrossRefGoogle ScholarPubMed
Wang, Y., Spence, M. M., Jongman, A., Sereno, J. A.. (1999). Training American listeners to perceive Mandarin tones. The Journal of the Acoustical Society of America, 106, 36493658.CrossRefGoogle ScholarPubMed
Whitmore, G., Seshadri, V.. (1987). A heuristic derivation of the inverse gaussian distribution. The American Statistician, 41, 280281.CrossRefGoogle Scholar
Winn, M. B., Wendt, D., Koelewijn, T., Kuchinsky, S. E.. (2018). Best practices and advice for using pupillometry to measure listening effort: An introduction for those who want to get started. Trends in Hearing, 22, 132.CrossRefGoogle ScholarPubMed
Zekveld, A. A., Kramer, S. E., Festen, J. M.. (2011). Cognitive load during speech perception in noise: The influence of age, hearing loss, and cognition on the pupil response. Ear and Hearing, 32, 498510.CrossRefGoogle ScholarPubMed
Figure 0

Figure 1 Drift-diffusion model for tone learning. The tones {T1, T2, T3, T4} represent the different categories; s denotes an input category, d′\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d^{\prime }$$\end{document} the different possible response categories, and d the final response category. Here we are illustrating a single trial with input tone T1 (s=1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s = 1$$\end{document}) that was eventually correctly identified (d=1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d=1$$\end{document}). a Shows a process whose parameters can be inferred from data on both response categories and response times. Here, after an initial δs\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{s}$$\end{document} amount of time required to encode an input category s (here T1), the evidence in favor of different possible response categories d′\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d'$$\end{document} accumulates according to latent Wiener diffusion processes Wd′,s(τ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W_{d',s}(\tau )$$\end{document} (red, blue, green, and purple) with drifts μd′,s\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{d',s}$$\end{document}. The decision d (here T1) is eventually taken if the underlying process (here the red one) is the first to reach its decision boundary bd,s\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{d,s}$$\end{document}. b shows a process with additional identifiability restrictions (for all d′\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d'$$\end{document} and s, δs=0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{s}=0$$\end{document}, bd′,s=b\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{d',s}=b$$\end{document} fixed, and ∑d′=1d0μd′,s=d0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum _{d'=1}^{d_{0}}\mu _{d',s}=d_{0}$$\end{document}) considered in this article which can be inferred from data on response categories alone.

Figure 1

Figure 2 Description of PTC1 data: The proportion of times the response to an input tone was classified into different tone categories over blocks across different subjects, each for the four input tones (indicated in the panel headers). The thick line represents the median performance and the shaded region indicates the corresponding middle 30% quantiles across subjects.

Figure 2

Algorithm 1 Generating the passage times τ1:d0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _{1:d_{0}}$$\end{document} given argmind′∈{1:d0}τd′=d\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\arg \min _{d^{\prime }\in \{1:d_{0}\}}\tau _{d^{\prime }}=d$$\end{document}

Figure 3

Figure 3 Description of the synthetic data: True values of the drift parameters averaged over the subjects, denoted by μd′,s(t)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{d^{\prime },s}(t)$$\end{document}, and true probabilities P{di,ℓ,t∣si,ℓ,t,μ1:d0,s(i)(t)}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P\{d_{i,\ell ,t} \mid s_{i,\ell ,t}, {\mu }_{1:d_{0},s}^{(i)}(t)\}$$\end{document} averaged over the subjects, denoted here by Pd′,s(t)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P_{d^{\prime },s}(t)$$\end{document}. Here T1, T2, T3, and T4 represent input categories 1 to 4, respectively. Some of the curves overlap according to the true clustering structure described in Sect. 5.

Figure 4

Figure 4 Results for synthetic data: Posterior trajectories of the probabilities for each combination of (d′,s)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(d^{\prime },s)$$\end{document} over blocks estimated by the proposed model. The shaded areas represent the corresponding 95% point-wise credible intervals. The thick dashed lines represent underlying true curves some of which overlap according to the true clustering structure described in Sect. 5.

Figure 5

Figure 5 Results of the synthetic data: Boxplots of the estimated probabilities over 50 simulations, and true probabilities (in red dot) of each block and for two panels, one from each similarity group (panel T1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{1}$$\end{document} in the top and T3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{3}$$\end{document} in the bottom).

Figure 6

Figure 6 Results for PTC1 data: Estimated probability trajectories compared with average proportions of times an input tone was classified into different tone categories across subjects (in dashed line). The means across subjects are indicated by thick lines and the shaded regions indicate corresponding 95% coverage regions.

Figure 7

Figure 7 Results for PTC1 data: Network plot of similarity groups showing the intra- and inter-cluster similarities of tone recognition problems. Each node is associated with a pair indicating the input-response tone category, (sd). The number associated with each edge indicates the proportion of times the pair in the two connecting nodes appeared in the same cluster after burnin.

Figure 8

Figure 8 Results for PTC1 data: Estimated posterior mean trajectories of the population level drifts μd′,s(t)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{d^{\prime },s}(t)$$\end{document} for the proposed model. The shaded areas represent the corresponding 95% point-wise credible intervals.

Figure 9

Figure 9 Results for PTC1 data: Estimated posterior mean trajectories for individual specific drifts μd′,s(i)(t)=exp{fd′,s(t)+uC(i)(t)}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{d^{\prime },s}^{(i)}(t) = \exp \{ f_{d^{\prime },s}(t) + u_{C}^{(i)}(t) \}$$\end{document} for successful identification (d′=s)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(d^{\prime }=s)$$\end{document} for two different participants—one performing well (dashed line) and one performing poorly (dotted line). The shaded areas represent the corresponding 95% point-wise credible intervals.

Supplementary material: File

Mukhopadhyay et al. supplementary material

Mukhopadhyay et al. supplementary material
Download Mukhopadhyay et al. supplementary material(File)
File 410.9 KB