Hostname: page-component-cd9895bd7-7cvxr Total loading time: 0 Render date: 2024-12-25T20:38:55.347Z Has data issue: false hasContentIssue false

Diagnostic Task Selection for Strategy Classification in Judgment and Decision Making: Theory, Validation, and Implementation in R

Published online by Cambridge University Press:  01 January 2023

Marc Jekel*
Affiliation:
Max Planck Institute for Research on Collective Goods, Kurt-Schumacher-Str. 10, D-53113, Bonn, Germany
Susann Fiedler
Affiliation:
Max Planck Institute for Research on Collective Goods
Andreas Glöckner
Affiliation:
Max Planck Institute for Research on Collective Goods
Rights & Permissions [Opens in a new window]

Abstract

One major statistical and methodological challenge in Judgment and Decision Making research is the reliable identification of individual decision strategies by selection of diagnostic tasks, that is, tasks for which predictions of the strategies differ sufficiently. The more strategies are considered, and the larger the number of dependent measures simultaneously taken into account in strategy classification (e.g., choices, decision time, confidence ratings; Glöckner, 2009), the more complex the selection of the most diagnostic tasks becomes. We suggest the Euclidian Diagnostic Task Selection (EDTS) method as a standardized solution for the problem. According to EDTS, experimental tasks are selected that maximize the average difference between strategy predictions for any multidimensional prediction space. In a comprehensive model recovery simulation, we evaluate and quantify the influence of diagnostic task selection on identification rates in strategy classification. Strategy classification with EDTS shows superior performance in comparison to less diagnostic task selection algorithms such as representative sampling. The advantage of EDTS is particularly large if only few dependent measures are considered. We also provide an easy-to-use function in the free software package R that allows generating predictions for the most commonly considered strategies for a specified set of tasks and evaluating the diagnosticity of those tasks via EDTS; thus, to apply EDTS, no prior programming knowledge is necessary.

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
The authors license this article under the terms of the Creative Commons Attribution 3.0 License.
Copyright
Copyright © The Authors [2011] This is an Open Access article, distributed under the terms of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.

1 Introduction

The identification of individuals’ decision strategies has always challenged behavioral decision research. There are at least three traditional approaches. Structural modeling applies a regression based approach to identify the relation between the distal criterion variable, proximal cues, and peoples’ judgments (e.g., Brehmer, 1994; Brunswik, 1955; Reference Doherty and KurzDoherty & Kurz, 1996; see Karelaia & Hogarth, 2008, for a meta-analysis); process tracing methods, for example, record information search (e.g., Reference Hilbig, Erdfelder and PohlPayne, Bettman, & Johnson, 1988) or use think aloud protocols (e.g., Reference Glöckner and WittemanMontgomery & Svenson, 1989; Reference Russo, Johnson and StephensRusso, Johnson, & Stephens, 1989) to trace the decision process (see Schulte-Mecklenbeck, Kuehberger, & Ranyard, 2011, for a review); whereas comparative model fitting approaches investigate the fit of data and predictions of different models to determine the model or decision strategy employed (e.g., Bröder, 2010; Reference Erev, Roth, Slonim and BarronBröder & Schiffer, 2003; see also Reference Pitt and MyungPitt & Myung, 2002).

Comparative model fitting in particular has gained popularity in recent Judgment and Decision Making (JDM) research. In this paper, we discuss the problem of diagnostic task selection when using this strategy classification method. We suggest the Euclidian Diagnostic Task Selection (EDTS) method as a standardized solution. We report results from a comprehensive model recovery simulation that investigates the effects of different task selection procedures, number of dependent measures and their interaction on the reliability of strategy classification in multiple-cue probabilistic inference tasks.

2 Task selection in strategy classification based on comparative model fitting

The principle of strategy classification based on comparative model fitting (referred to in the following as strategy classification) is comparing a vector of choice data Da consisting of n choices for person a to a set of predictions Pa of a set of strategies S. The strategy that “explains” the data vector best is selected. Strategies in set S have to be sufficiently specified to allow the definition of a complete vector of predictions Pa. Vector Pa can consist of sub-vectors for predictions on different dependent measures. Some strategies have free parameters to capture individual differences. Aspects that have to be considered to achieve a reliable strategy classification are: a) that all relevant strategies are included in the strategy set (e.g., Bröder & Schiffer, 2003), b) that overfitting due to model flexibility is avoided (e.g., Bröder & Schiffer, 2003), c) that appropriate model selection criteria are used (e.g., Hilbig, 2010; Reference Hilbig, Erdfelder and PohlHilbig, Erdfelder, & Pohl, 2010; Reference Pitt and MyungPitt & Myung, 2002; Reference Pitt and MyungPitt, Myung, & Zhang, 2002), and d) that diagnostic tasks are selected that allow differentiating between strategies (e.g., Glöckner & Betsch, 2008a). In the current paper, we investigate the influence of a more or less diagnostic task selection in more detail.

We are particularly interested in the consequences of representative sampling as opposed to diagnostic task selection. Tasks are to a varying degree representative of the environment and/or they are more or less diagnostic with respect to strategy identification (Reference Gigerenzer, Fiedler and JuslinGigerenzer, 2006). Representative sampling means that experimental tasks are sampled based on the probability of them occurring in the environment to which results should be generalized to (Reference FiedlerBrunswik, 1955).Footnote 1 Representative sampling is important with respect to external validity for at least two reasons. First, if one wants to generalize findings on rationality or accuracy of people’s predictive decisions from an experiment to the real world, it is essential to draw a representative and hence generalizable sample.Footnote 2 One could, for instance, not claim that the calibration of a person‘s confidence judgments is bad if this conclusion is based on a set of “trick questions” that in fact are more difficult than they seem and that rarely appear in the real world (Reference BrunswikGigerenzer, Hoffrage, & Kleinbölting, 1991).Footnote 3 A second aspect concerns interactions between task selection and strategy use. If the selection of tasks disadvantages the use of certain strategies (i.e., in contrast to its application in the real world), people are less likely to employ it, which leads to a general underestimation of its frequency of application.Footnote 4

On the contrary, in diagnostic sampling, tasks are selected that differentiate best between strategies, that is, for which the considered strategies make sufficiently different predictions. Diagnostic task selection has not been given sufficient attention in some previous work. For example, the priority heuristic as a non-compensatory model for risky choices (Reference BrehmerBrandstätter, Gigerenzer, & Hertwig, 2006) was introduced based on a comparative model test. In 89 percent of the choice tasks used in the study, the priority heuristic made the same prediction as one of the established models (i.e., cumulative prospect theory with parameters estimated by Reference Dougherty, Gettys and OgdenErev, Roth, Slonim, & Barron, 2002). Subsequent analyses showed that the performance of the heuristic dramatically drops when more tasks are implemented, for which the heuristic and prospect theory make different predictions (Reference Busemeyer, Johnson, Koehler and HarveyGlöckner & Betsch, 2008a). More research showed that conclusions about the heuristic being a reasonable process model for the majority of people were premature (Reference Bröder, Glöckner and WittemanAyal & Hochman, 2009; Reference Hochman, Ayal and GlöcknerFiedler, 2010; Reference Dhami, Hertwig and HoffrageGlöckner & Herbold, 2011; Reference Glöckner and BetschHilbig, 2008; Reference Gigerenzer and ToddJohnson, Schulte-Mecklenbeck, & Willemsen, 2008). To circumvent such problems in future, diagnostic task selection should be given more attention. However, diagnostic task selection becomes a complex problem if multiple strategies and multiple dependent measures are considered simultaneously as described in the next section. Afterwards we suggest and evaluate a standardized method that allows selecting a set of very diagnostic tasks from all possible tasks based on a simple Euclidian distance calculation in a multi-dimensional prediction space.

3 Strategy classification based on multiple measures

Strategy classification methods were commonly based on choices only. However, strategies are often capable of perfectly mimicking each others’ choices. Non-compensatory heuristics, for example, are submodels of the weighted additive strategy with specific restrictions of cue weights. This problem is even more apparent when, in addition, strategies are considered that do not assume deliberate stepwise calculations (Reference Hilbig, Erdfelder and PohlPayne, et al., 1988). Recent findings on automatic processes in decision making (Reference Hochman, Ayal and GlöcknerGlöckner & Betsch, 2008c; Reference Dhami, Hertwig and HoffrageGlöckner & Herbold, 2011) suggest also taking into account cognitive models assuming partially automatic-intuitive processes (Reference Gigerenzer, Hoffrage and KleinböltingGlöckner & Witteman, 2010). Important classes of models are evidence accumulation models (Reference Busemeyer, Johnson, Koehler and HarveyBusemeyer & Johnson, 2004; Reference Bröder and SchifferBusemeyer & Townsend, 1993; Reference Roe, Busemeyer and TownsendRoe, Busemeyer, & Townsend, 2001), multi-trace memory models (Reference Glöckner and BetschDougherty, Gettys, & Ogden, 1999; Reference Thomas, Dougherty, Sprenger and HarbisonThomas, Dougherty, Sprenger, & Harbison, 2008), and parallel constraint satisfaction (PCS) models (Reference Bröder, Glöckner and WittemanBetsch & Glöckner, 2010; Reference Busemeyer and TownsendGlöckner & Betsch, 2008b; Reference ParadisHolyoak & Simon, 1999; Reference Simon, Krawczyk, Bleicher and HolyoakSimon, Krawczyk, Bleicher, & Holyoak, 2008; Reference Thagard, Millgram, Ram and LeakeThagard & Millgram, 1995). As an example, we include a PCS strategy in our simulation.

Based on the idea that multiple measures can improve differentiation, the multiple-measure maximum-likelihood (MM-ML) strategy classification method (Glöckner, 2009, 2010; Reference Hilbig, Erdfelder and PohlJekel, Nicklisch, & Glöckner, 2010) was developed. MM-ML simultaneously takes into account predictions concerning choices, decision time, and confidence. MM-ML defines probability distributions for the data-generating process of multiple dependent measures (e.g., choices, decision times and confidence) and determines the (maximum) likelihood for the data vector Da given the application of each strategy in the set S and multiple further assumptions (for details, see Appendix A).

It was shown that the MM-ML method leads to more reliable strategy classification than the choice based method (Reference Glöckner and BetschGlöckner 2009).Footnote 5 It has, for instance, been successfully applied to detect strategies in probabilistic inference tasks (Reference Gigerenzer, Hoffrage and KleinböltingGlöckner, 2010) and tasks involving recognition information (Reference Glöckner and BröderGlöckner & Bröder, 2011).

4 Simulation

We used a model recovery simulation approach to investigate the effects of task diagnosticity, numbers of dependent measures, and the interaction of the two on the reliability of strategy classification. We thereby simulated data vectors for hypothetical strategy users with varying noise rates and tried to recover their strategies employing the MM-ML method. In accordance with Glöckner (2009), we simulated probabilistic inferences for six different cue patterns (i.e., a specific constellation of cue predictions in the comparison of two options; see Figure 1, right), which are repeated ten times each resulting in a total of 60 tasks per simulated person.Footnote 6 The choice of the cue patterns was manipulated to test our predictions with respect to representative sampling and diagnostic task selection based on a standardized method. In practice, the selection of the most diagnostic cue patterns for a set of strategies is not trivial and to the best of our knowledge no standard procedures are available. We suggest a method to determine the cue patterns that differentiate best between any given set of strategies and test whether the method increases reliability in strategy classification.

Figure 1: Prediction for 40 qualified cue patterns generated from five strategies (black = PCS, blue = TTB, red = EQW, green = WADDcorr, purple = RAND) in the rescaled prediction space with the three dependent measures (i.e., choices, decision times, confidence judgments) as coordinate axes. The size of the dots is (logarithmically) related to the number of predictions (i.e., density) at the respective coordinates. The five stars represent the predictions of the strategies for the (exemplary) cue pattern shown in the right side of Figure 1.

4.1 Design

We generated data based on five strategies in probabilistic inference tasks with two options and four binary cues. We varied the validity of the cues in the environment, the degree of noise in the data generating process, the number of dependent measures included in the model classification, and the diagnosticity of cue patterns that were used. As dependent variables, we calculated the proportion of correct classifications—the identification rate—and the posterior probability of the data-generating strategy.Footnote 7 Ties and misclassifications were counted as failed identification. This results in a 5 (data generating strategy) × 3 (environment) × 4 (error rates for choices) × 3 (noise level for decision times and confidence judgments) × 3 (number of dependent measures) × 4 (diagnosticity of tasks) design. For each condition, we simulated 252 participants, resulting in 544,320 data points in total.

4.1.1 Data-generating strategies

For simplicity, we rely on the same data-generating strategies used in previous simulations (Reference Glöckner and BetschGlöckner, 2009) namely: parallel constraint satisfaction (PCS), take-the-best (TTB), equal weight (EQW), weighted additive (WADDcorr), and random (RAND) strategy, which are described in Table 1.

Table 1: Description of the strategies used in the simulation.

Note. We used PCS with fixed parameters and a quadratic cue transformation function: decay = .10; wo1−o2 = −.20; wc−o = .01 / −.01 [positive vs. negative prediction]; wv = ((v − .50) × 2)2, stability criterion = 10−6; floor = −1; ceiling = 1 (see Glöckner, 2010, for details).

4.1.2 Environments

We used three environments: a typical non-compensatory environment with one cue clearly dominating the others (cue validities = [.90 .63 .60 .57]),Footnote 8 a compensatory environment with high cue dispersion (cue validities = [.80 .70 .60 .55]), and a compensatory environment with low cue dispersion (cue validities = [.80 .77 .74 .71]).

4.1.3 Error rates for choices and noise level for confidence and time

For each simulated participant, a data vector Da was generated, based on the prediction of the respective data-generating strategy plus noise. The vector consisted of a sub-vector for choices, decision times, and confidence. For the choice vector, (exact) error rates were manipulated from 10% to 25% at 5%-intervals. For example, an error rate of 10% leads to 6 out of 60 choices that are inconsistent with the predictions of the strategy. It was randomly determined which six choices were flipped to the alternative choice for each simulated participant.Footnote 9

Normally distributed noise was added to the predictions of the strategies for the decision time and confidence vectors (normalized to a mean of 0 and a range of 1). The three levels of noise on both vectors differed with respect to the standard deviation of the noise distribution σerror = [1.33 1 0.75], which is equivalent to a manipulation of the effect size of d = [0.75 1 1.33]. Note that adding normally distributed noise N(µ = 0, σerror) to a normalized prediction vector leads to a maximum (population) effect size of d = µmax − µminpooled = 1/σpooled . Note also that the term µmax − µmin is the difference between the means of the most distant populations from which realizations of the dependent measures are sampled and which reduces to 1 due to normalizing prediction vectors. The pooled standard deviation of those populations is equal to the standard deviation of the noise distribution (i.e., σpooled = σerror) because random noise is the only source of variance within each population. Thus, a standard deviation of (e.g.) σerror = σpooled = 1.33 leads to a maximum effect size of d = 1/1.33 ≈ 0.75 between the most distant populations of the dependent measures.

4.1.4 Number of dependent measures

The strategy classification using MM-ML was based on varying numbers of dependent measures including (a) choices only, (b) choices and decision times, or (c) choices, decision times and confidence judgments.

4.1.5 Diagnosticity in Cue Patterns

We manipulated the diagnosticity of cue patterns used in strategy classification by using a) the Euclidian Diagnostic Task Selection (EDTS) method that determines the most diagnostic tasks given a set of strategies and the number of dependent measures considered, b) two variants of this method that generate medium and low diagnostic tasks, and c) representative (equal probability) sampling of tasks.

Probabilistic inference tasks with two options and four binary cues (i.e., [+ –]) allow for 240 distinct cue patterns. To prepare task selection, the set was reduced to a qualified set of 40 cue patterns by excluding all option-reversed versions (n = 120) and versions that were equivalent except for the sign of non-discriminating cues (i.e., [– –] vs. [+ +]). Then, strategy predictions for each of the three dependent measures were generated and rescaled to the range of 0 to 1 (for details, see Appendix B). The rescaled prediction weights for each strategy and each qualified task are plotted in the three-dimensional space that is spanned by the three dependent measures (Figure 1).

EDTS (Table 2) is based on the idea of cue patterns being diagnostic if predictions for strategies differ as much as possible. The pairwise diagnosticity is thereby measured as Euclidian distances between the predictions of two strategies for each cue pattern in the three-dimensional prediction space (Figure 1). The main criterion for cue pattern selection is the average diagnosticity of a cue pattern which is the mean of its Euclidian distances across all possible pairwise strategy comparisons in the space (i.e., PCS vs. TTB, PCS vs. EQW, …). For statistical details, see Appendix C, and for a discussion of EDTS-related questions, see Appendix E.

Table 2: Euclidian Diagnostic Task Selection (EDTS).

For the high diagnosticity condition, we selected six cue patterns according to the EDTS procedure. For the medium and low diagnosticity condition, we selected cue patterns from the middle and lower part of the by diagnosticity sorted list of cue patterns generated in step 4 of EDTS. Cue patterns were sampled uniformly at random for the representative sampling condition.Footnote 10

4.1.6 EDTS function in R

We have implemented EDTS as an easy-to-use function in the free software package R (2011). You can specify your own environment (i.e., number of cues and validities of cues), generate the set of unique pairwise comparisons between cue patterns for your environment (as described in 4.1.5), derive predictions for all strategies on choices, decision times, and confidence judgments for those tasks (as described in 4.1.1), and apply EDTS to calculate the diagnosticity of each task (as described in 4.1.5); see Appendix D and F for a detailed description of the EDTS function.

By applying the EDTS function, you can find the most diagnostic tasks from a specified environment, set of strategies, and set of measures for future studies. You can also (systematically) alternate the number and validities of cues to find the environment that produces tasks that optimally distinguish between a set of strategies. Finally, you can also use the EDTS function to evaluate the diagnosticity of tasks, thus the reliability of strategy comparisons, and thus the reliability of conclusions from past studies.

4.2 Hypotheses

Based on previous simulations (Reference Glöckner and BetschGlöckner, 2009), we predict that additional dependent measures for MM-ML lead to higher identification rates and posterior probabilities for the data-generating strategy. We further expect that less diagnostic cue patterns lead to lower identification rates and posterior probabilities. We also hypothesize an interaction effect between diagnosticity and the number of dependent measures, that is, less diagnostic cue patterns benefit more from adding further dependent measures. For practical purposes, we are particulary interested in the size of the effect of each manipulation to assess the extent to which common practices influence results.

5 Results

5.1 Identification Rate

The overall identification rates for each type of task selection averaged across all environments and all strategies based on choices only are displayed in Figure 2 (left). As expected, cue patterns with high diagnosticity selected according to EDTS lead to the highest identification followed by representative sampling; cue patterns with medium and low diagnosticity were consistently even worse in identification. (see Figure 2, middle) or close (left and right) in identification. All types of task selection benefit from adding a second (see Figure 2, middle) and a third (see Figure 2, right) dependent measure. Representative sampling and the conditions with low and medium diagnosticity benefit most from adding a third dependent measure.Footnote 12

Figure 2: Identification rates for each type of task selection averaged across strategies and environments based on a) choices (left), b) choices and decision time (middle), and c) choices, decision time, and confidence (right).

The term є refers to the error rate for choices. The middle and right graphs are separated by the effect size for decision time (DT) resp. decision time and confidence (DT & CF), d indicates the (maximum) effect size.

Hence, results are descriptively in line with our hypotheses. For a statistical test of the hypotheses, we conducted a logistic regression predicting identification (1 = identified, 0 = not identified) by number of dependent measures, diagnosticity of tasks, environment, generating strategy, epsilon rate for choices, effect size for decision times, and confidence judgments (Table 3, first model).Footnote 13

Table 3: Logistic regression predicting successful identification in strategy classification (Model 1) and linear regression predicting posterior probability of the data generating strategy (Model 2).

Note. Variables are dummy-coded and compared against the control condition. Variables for which interactions are calculated are centered. Nagelkerke’s R2 = .547 for identification rates; Adj. R2 = .474 for posterior probabilities (N = 544,320, p < .001). p < .001 for all predictors and model comparisons (full vs. reduced models).

Results of the logistic regression indicate changes in the ratio of the odds for a successful strategy identification. For example, the odds ratio for the first dummy variable indicating that two dependent measures were used (i.e., choices and decision times), as compared to choices only (i.e., control group), is 7.39. This implies that the odds for identification increase by the factor 7.39 from using choices alone to using choices and decision times.Footnote 14 Adding decision time and confidence increases the odds ratio for identification by a factor of 20.91 (compared to choices only).

The odds for identification decrease by the factor of 0.29 (i.e., reduction to less than one third; see Footnote 14) when using representative sampling instead of high diagnostic sampling according to EDTS. The reduction from high to medium and low diagnostic sampling is even more pronounced.

Finally, less diagnostic pattern selection mechanisms benefit more from adding further dependent variables, as indicated by the odds ratios for the interaction terms between number of dependent measures and task diagnosticity. In particular, when all three dependent measures are considered, identification dramatically increases for representative sampling as well as medium and low diagnostic tasks, so that the disadvantage of representative sampling decreases to 3% (Table 4).

Table 4: Identification rates for the number of dependent measures and task selection vs. representative sampling.

Note. Averaged over strategies, є rates for choices, effect sizes for decision times and confidence judgments, and environments.

Hence, in line with our hypothesis, we replicate the finding that identification increases with number of dependent measures. High-diagnosticity task-sampling according to EDTS leads to superior identification rates. The disadvantage of representative sampling decreases when more dependent measures are included.

5.2 Posterior probabilities for the data generating strategy

To analyze the effects of our manipulations further, we regressed posterior probabilities on the same factors described above (Table 3, Model 2). As expected, given that identification and posterior probabilities are both calculated from Bayesian Information Criterion (see Appendix A, Equation 2) values, the hypothesized effects of the manipulations are replicated. The independent variables of the linear model explain 47.4% of the variance in posterior probabilities. The number of dependent measures and task diagnosticity explain most of the unique varianceFootnote 15 in posterior probabilities (19.3% and 16.9%). In comparison to classification based on choices only, two and three dependent measures lead to an increase of .199 and .323 in posterior probabilities. In comparison to high diagnostic cue patterns selected according to EDTS, posterior probabilities are reduced by –.260 and –.320 for cue patterns with medium and low diagnosticity, and by –.123 for representative sampling. Thus, cue pattern selection according to EDTS leads to considerably higher posterior probabilities of the data generating strategies than representative sampling.

6 Discussion and conclusion

Individual level strategy classification in judgment and decision-making is a statistical and a methodological challenge. There was a lack of standard solutions to the complex problem of diagnostic task selection in multi-dimensional prediction spaces. In the current paper, we suggest Euclidian diagnostic task selection (EDTS) as a simple method to select highly diagnostic tasks and show that EDTS increases identification dramatically. Furthermore, we replicate the increase in identification rates by employing multiple dependent measures in multiple-measure maximum likelihood (MM-ML) strategy classification method (Glöckner, 2009, 2010). We find that, under the conditions considered in our simulation, representative task-sampling reduces the odds for successful strategy classification by more than factor 1/3 compared to EDTS. This disadvantage, however, reduces if multiple dependent measures are used. Hence, if representative sampling is advisable for other methodological reasons (see section 2), multiple measures should be used. Unfortunately, this is not possible for all models because many models predict choices only (i.e., paramorphic models of decision making).

Our findings highlight that the issue of diagnosticity in task selection in comparative model fitting should be taken very seriously. To avoid ad-hoc criteria, we suggest using the EDTS method introduced in this article. Furthermore it would be advisable to report average diagnosticity scores for each selected cue pattern to be able to evaluate results better.

Robin Horton (1967a, 1967b, 1993)Footnote 16, who investigated the differences between religious and scientific thinking within the framework of Popper’s critical rationalism, stated (1967b, p. 172) that “[f]or the essence of experiment is that the holder of a pet theory does not just wait for events to come along and show whether or not it has a good predictive performance.”—an approach that might be equated with representative sampling—“He bombards it with artificially produced events in such a way that its merits or defects will show up as immediately and as clearly as possible.” We hope that EDTS may help to find those events in a more systematic fashion in future research.

Appendices

Appendix A: The Multiple-Measure Maximum Likelihood strategy classification method (MM-ML)

Appendix A describes the basic math of the MM-ML method; see Glöckner (2009, 2010) and Jekel, Nicklisch, and Glöckner (2010) for a more thorough description of the method, tools, and tutorials on how to apply MM-ML.

To apply MM-ML in probabilistic decision making, it is necessary to select a set of strategies, a set of dependent measures, and a set of cue patterns. For each dependent measure, assumptions have to be made concerning the probability function of the data-generating process. In our simulation study, we use choices, decision times, and confidence judgments as dependent measures and assume choices for six cue patterns which are repeated ten times each. The number of choices in line with a strategy prediction is assumed to be binomially distributed with a constant error rate for each cue patter; (log transformed and order corrected) decision times and confidence judgments are assumed to be drawn from normal distributions around rescaled prediction weights with constant standard deviation per measure.

Given a contrast weight t T i for the decision time and t C i for the confidence judgment of task i, further observing a data vector D consisting of a subvector for choices with n j k being the number of choices of type of tasks j congruent to strategy k and consisting of subvectors for decision time x T i and confidence judgment x C i for task i, it is possible to calculate the likelihood L total for the observed data vector under the assumption of an application of strategy k (and the supplementary assumptions mentioned above) for a participant according to (Glöckner, 2009, Equation 8, p. 191):

(1)

The error rate for choices, єk, the overall mean and standard deviation for decision times (µT, σT) and confidence judgments (µC, σC) as well as the rescaling factor R T and R C (R T, R C ≥ 0) for decision times and confidence judgments that minimize the log-likelihood function are estimated.

The Bayesian Information Criterion (BIC, Schwarz, 1978) is calculated to account for different numbers of parameter (numbers vary because some strategies do not predict differences on all dependent measures or assume a fixed error rate of .50) according to:

(2)

N obs represents the number of task types (i.e., six in the simulations) and N p the number of parameters that need to be estimated for the likelihood. Thus, a strategy with more free variables is punished for its flexibility.

Finally, the posterior probability Pr for a specific strategy k, i.e., the probability of the strategy k as the data-generating mechanism under consideration of the observed data D and under the assumption of equal prior probabilities for all (i.e., K) considered strategies, can be calculated based on the BIC values according to (compare with Wagenmakers, 2007, Equation 11, p. 797):

(3)

Appendix B: Strategy predictions

Predictions of strategies are derived by assuming that TTB, EQW, and WADDcorr are applied in a stepwise manner according to the classic elementary information processes approach (e.g., Payne, et al., 1988). For PCS, predictions are derived from a standard network simulation (Reference Busemeyer and TownsendGlöckner & Betsch, 2008b; Reference HortonGlöckner, Betsch, & Schindler, 2010; Reference Glöckner and BröderGlöckner & Bröder, 2011; Reference HortonGlöckner & Hodges, 2011) using the parameters mentioned in the Note of Table 1. Table 5 shows the predictions for the cue patterns selected for the high diagnosticity condition in the environment with cue validities of .80, .70, .60, and .55 as an example.

Choices. Choice predictions are determined according to the mechanisms described in Table 1.

Table 5: Highly diagnostic cue patterns for three dependent measures in a compensatory environment (validities = [.80 .70 .60 .55]) and predictions for each strategy and dependent measure.

Note. Positive cue values are indicated by +, negative cue values by −. A:B represents guessing between options.

Decision times. For TTB, EQW, and WADDcorr, the number of computational steps necessary to apply the strategy is used as time prediction. For PCS, the number of iterations of the network necessary to find a stable solution is used as an indicator for decision time.Footnote 17

Confidence judgments. For TTB, the validity of the discriminating cues is used as a predictor to confidence (Reference BrunswikGigerenzer, et al., 1991). For EQW and WADDcorr, the difference in the (un)weighted sum of cue values for each option is used instead. For PCS, the difference in activations of the options is used as a predictor for confidence judgments.

Appendix C: Euclidian Diagnostic Task Selection (EDTS)

Step 1: Generate standardized prediction vectors

Define a set of K strategies s to be tested, a set of P dependent measures d used for MM-ML, and a set of I qualified cue patterns c (i.e., excluding identical patterns). Calculate prediction vectors for each strategy (see Appendix B) and rescale them to a range of 0 to 1 per strategy. Note that dependent measures of probabilities (e.g., choices) should not be rescaled.Footnote 18 The goal is to choose n cue patterns highest in diagnosticity from the set of the I cue patterns. Assume the following notation for raw (indicated by superscript R) contrast weights cw:

(4)

Each contrast vector is calculated from the raw values to fit the range from 0 to 1. Contrast weights are rescaled by:

(5)

Step 2: Calculate diagnosticity scores for strategy comparisons

Compute the diagnosticity scores for each task as the Euclidian distances ED for each strategy comparison and each cue pattern within the space spanned by the vectors of the P dependent measures that are weighted by w d p. Following this, standardize these distances to a range from 0 to 1. ED between strategy k and o (ko) for cue pattern i are calculated by:

(6)

For each comparison of strategy k and o, rescale ED s ks oR across all I cue patterns to fit the range from 0 to 1 by:

(7)

[Rationale for rescaling: Euclidian distances for each strategy comparison should have the same range to avoid overweighting (resp. underweighting) of strategy comparisons with a high variance (resp. low variance) in Euclidian distances.]

Step 3: Calculate the average diagnosticity scores

Calculate the means for each row of the matrix containing the rescaled Euclidian distances ED c i to receive the average diagnosticity AD score for each cue pattern by:

(8)

Step 4: Sort cue patters by average diagnosticity scores and select cue patterns

The set of I cue patterns can be easily sorted by their AD score. The n cue patterns with the highest AD score would be selected.

Step 5: Refine selection

Investigate if the maximum of diagnosticity scores for each strategy comparison is above a threshold t min. To find an appropriate set of cue patterns, the threshold should increase with the number of dependent measures used and decrease with the number of pairwise comparisons. In the simulations, we used a threshold value of t min=.75. If a maximum is below the aspired threshold, replace the last cue pattern(s) by one of the following cue patterns until the threshold is reached for all comparisons. If no such cue pattern is found, repeat the procedure with a lower threshold.

[Rationale: A high mean of rescaled Euclidian distances for a cue pattern can be produced by a single high distance for one of the strategy comparisons. Apply step 5 to ensure that there is at least one diagnostic cue pattern for each strategy comparison in the subset (as defined by the threshold).]

Appendix D: Implementation of EDTS as a function in R

EDTS is implemented as an easy-to-use function in R. R (2011) is a software for statistical analysis under the GNU general public license, e.g., it is free of any charge. R is available for Windows, Mac, and UNIX systems. To download R, visit the Comprehensive R Archive Network (http://cran.r-project.org/). To learn more about R, we propose the free introduction to R by Paradis (2005); however, to apply EDTS in R, no sophisticated prior experience with the R syntax is required.

You can download the EDTS.zip folderFootnote 19 from http://journal.sjdm.org/vol6.8.html, listed with this article. In the folder EDTS, there are two files—mainFunction.r and taskGenerator.r—and an additional folder strategies containing six further R files. In the current version of the EDTS function, it is possible to generate all possible unique pattern comparisons for two-alternative decision tasks with binary cue values (i.e., 1 or –1), to derive predictions for all tasks and a set of default strategies, and to calculate the diagnosticity index for each task as proposed in the article.

To use the EDTS function, you need to copy and paste (or submit) the code provided in the file mainFunction.r, i.e., you can open mainFunction.r in a standard text editor, copy the entire code, and paste the code in the open R console. To call the function afterwards, type the command:

EDTS (setWorkingDirectory, validities,

measures, rescaleMeasures,

weightingMeasures, strategies, generateTasks,

derivePredictions, reduceSetOfTasks,

printStatus, saveFiles, setOfTasks,

distanceMetric, PCSdecay, PCSfloor,

PCSceiling, PCSstability, PCSsubtrahendResc,

PCSfactorResc, PCSexponentResc)

in the open R console and hit Enter. If an argument of the function is left blank, the default is applied. Arguments, descriptions, valid values, examples and defaults are listed in Appendix F. In the following, we give an example for illustrative purposes.

Example

Assume you want to test which of (e.g.) the four strategies—PCS, TTB, EQW, or RAND—describes human decision making best in a six-cue environment with the cue validities v = [.90 .85 .78 .75 .70 .60] (compare with Reference Simon, Krawczyk, Bleicher and HolyoakRieskamp & Otto, 2006). Your goal is to select the most diagnostic tasks from all possible tasks for an optimal comparison of strategies. Assume further that you will assess choices and decision times as dependent variables in your study; thus, you only need to rescale decision times (see Appendix C). For all the remaining arguments, you want to keep the defaults of the function.

To apply EDTS, you put the unzipped EDTS folder under C:\, open the file mainFunction.r with a text editor, copy the entire text and paste it in the open R console. Following, you type in:

EDTS(validities = c(.90, .85, .78, .75, .70, .60),

measures = c(“choice”, “time”),

rescaleMeasures = c(0, 1), strategies =

c(“PCS”, “TTB”, “EQW”, “RAND”))

and hit Enter. Three .csv files are created: (1) tasks.csv includes all qualified patterns for a pairwise comparisons with six cues (i.e., 364 tasks), (2) predictions.csv includes choice and decision time predictions for all strategies (i.e., PCS, TTB, EQW and RAND) and all tasks listed in tasks.csv, (3) outputEDTS.csv includes the average diagnosticity score (AD), the minimum, maximum, and median diagnosticity of all strategy comparisons. Additionally, “raw” diagnosticity scores for each task and each strategy are provided. Based on the AD scores, you finally select the most diagnostic tasks for the strategy comparisons in a six-cue environment (see Table 2, step 5).

Generalizations

We added two further strategies as default strategies: (1) WADDuncorr (Reference Rieskamp, Hoffrage, Gigerenzer and ToddRieskamp & Hoffrage, 1999) has been extensively used in past studies and thus can serve as an interesting competitor. WADDuncorr is identical to WADDcorr but does not correct validities for chance level (e.g., .5 for pairwise comparisons). (2) RAT (Reference Glöckner and WittemanLee & Cummins, 2004) is the rational choice model based on Bayesian inference. It has been included as a further strategy in order to allow comparisons between heuristic models and the rational solution in probabilistic decision making.

Additionally, it is also possible to extend the set of default strategies with your own strategies. To do so, you open the file predictions.csv and include a prediction column for each measure and for each task for your own strategies (as defined in tasks.csv). The labels of the new columns need to fit the form NameOfYourStrategy.Measure. Additionally, the order and number of columns (i.e., the order of predictions for each measure) need to follow the order of the measures of the other strategies included (i.e., choice, time, and confidence for the default measures).Footnote 20 To apply EDTS for your own specified set of strategies, you then include the names of your strategies in the argument strategies of the EDTS function and set the argument derivePredictions = 0 (i.e., predictions are not derived and the data matrix defined in predictions.csv with your set of strategies is loaded into the program instead).

It is also possible to add further dependent measures. Similar to adding strategies, you insert a further column for each strategy following the form Strategy.Measure for the labeling in the first row of the data matrix. For example, if you want to compare PCS and TTB on choices, decision times, and (e.g.) arousal (Reference Hochman, Ayal and GlöcknerHochman, Ayal & Glöckner, 2010), the file predictions.csv consists of 7 columns. In the first column, the number of the task is coded. From the second to the third column, PCS predictions are inserted with the labels PCS.choice, PCS.time, PCS.arousal in the first row of the data matrix. From the fifth to seventh column, TTB predictions are inserted with the labels TTB.choice, TTB.time, TTB.arousal. Thus, predictions for each measure are inserted by strategy and for each strategy the measures are in the same order.

In general, the EDTS function is thus applicable to any strategy for which quantitative predictions on each measure can be derived for a set of tasks. The function can also be applied to tasks differing from the default characteristics (e.g., probabilistic decision-making between three options and/or continuous cues) or from the default type (e.g., preference decisions between gambles) by inserting the predictions for each strategy and measure in the file predictions.csv as described above. Thus, the method is not limited to the strategies and tasks used and implemented as defaults in the EDTS function. The experienced R user can thus implement her strategies as R code. To simplify coding, the main EDTS function and strategies are coded in separate files (see folder strategies), and strategies are also coded as functions that are similar in structure (same input variables, etc.).Footnote 21

Appendix E: Open questions and future research

This short Appendix is supposed to make you aware of some open questions. For those researchers who are interested in applying EDTS, this section may sensitize you to critical aspects of EDTS. For those researchers who are interested in optimizing EDTS, the following open questions can be a hint for future studies; the EDTS function provided (see Appendix D and F) may further facilitate this process.Footnote 22

There are alternative selection criteria (e.g., maximum or median) that may be used for task selection instead of the mean proposed and validated in the current study. For example, strategy comparisons may be more effective if the most discriminating task for each comparison (= maximum) is selected. However, there are two opposing forces at work: the number of tasks increases rapidly if the set of strategies increases (i.e., 5 strategies = 10 tasks, 6 strategies = 15 tasks, 7 strategies = 21 tasks, etc.). This can lead to less repetition of the selected tasks if the number of tasks that can be presented in a study is limited. Less repetition can then lead to a less reliable strategy classification dependent on the error rate. It is therefore an open question if the gain of diagnosticity for single comparisons outweighs the loss of reliability due to less repetition of the tasks. To facilitate comparison between several diagnosticity statistics, the output of the EDTS function includes several diagnosticity statistics (mean, median, maximum, and minimum) and the “raw” diagnosticity scores for each strategy comparison and each task.

There is no need to restrict EDTS to Euclidian distances as the metric for diagnosticity scores. It is an open question if other metrics lead to reliable strategy classification as well (or even better ones). We have implemented the option to calculate diagnosticity scores based on Taxicab/Cityblock metrics (Reference KrauseKrause, 1987) in the EDTS function as well.

Finally, there may be reasons to weight the impact of each dependent measure on the diagnosticity score differently. For example, it may be reasonable to reduce the impact of dependent measures that are less reliable and thus favor more reliable measures in diagnostic task selection. It is an open question if different weighting schemes (e.g., weighting of each measure relative to a reliability index) lead to higher identification rates. We have implemented the option to weight measures differently in the EDTS function.

Appendix F: EDTS() function in R

Arguments, descriptions, valid values, examples, and defaults.

Footnotes

We thank Jonathan Baron and two anonymous reviewers for their suggestions and comments. This research was partially funded by the priority program of the German Science Foundation (DFG) SPP 1516 ”New frameworks of rationality“ project numbers GL 632/3-1 and BR 2130/8-1.

Note. Positive cue values are indicated by +, negative cue values by −. A:B represents guessing between options.

1 Reference Dhami, Hertwig and HoffrageDhami, Hertwig, and Hoffrage (2004) equate representative sampling with “probability sampling, in which each stimulus has an equal probability of being selected” (p. 962, emphasis original; see also Reference Hoffrage, Hertwig, Fiedler and JuslinHoffrage & Hertwig, 2006). Although it can be questioned whether equal probability sampling is a sound implementation of Brunswik’s representative sampling at all (i.e., the probability of a task to appear in a study should match the probability of the task to appear in the real world), we use equal probability sampling for matters of convenience and lack of knowledge of the “true” sampling probabilities for the tasks used in the simulation reported below.

2 See the correspondence criterion of rationality (Reference Todd and GigerenzerTodd & Gigerenzer, 2000).

3 Note that the frequency of an event in an environment is not per se an index of its significance. That is, rare events that lead to irrational behavior can be highly significant due to their consequences (e.g., severe punishment for not solving “trick questions”) or due to selective oversampling of these—then no longer—“rare” events (e.g., oversampling of “trick questions” to take advantage of irrational behavior).

4 The same is of course true for methodological approaches that hinder the application of certain strategies. It has, for instance, been shown that in some situations the classic mouselab paradigm hinders the application of weighted compensatory strategies (Reference Hochman, Ayal and GlöcknerGlöckner & Betsch, 2008c).

5 MM-ML has been implemented as an easy-to-use function in the open-source statistical package R (Reference Simon, Krawczyk, Bleicher and HolyoakJekel et al., 2010) and in a function for STATA (Reference Glöckner and BetschGlöckner, 2009). The most recent implementations are provided on request by the authors of this paper.

6 Completing 60 tasks takes about 5–15 minutes, which allows mixing them with sufficient distractors to avoid interactions of task selection and strategy use mentioned in section 2.

7 The posterior probability of the data-generating strategy can be calculated from the BIC values as described in Equation 3 in Appendix A (Reference WagenmakersWagenmakers, 2007).

8 The environment is non-compensatory because the most valid cue can never be overruled by less valid cues if compensatory strategies such as WADDcorr that takes (chance corrected) validities into account or PCS are applied.

9 Note that some strategies predict guessing for some (or all) types of tasks (e.g., RAND). The choices of a simulated participant applying (e.g.) RAND were determined probabilistically in a first step—with choice A vs. B being equally likely. A choice was flipped to the alternative option if it was randomly selected for an error application in a second step.

10 The ordering of cue patterns in EDTS dependends on the number of dependent measures taken into account. For the conditions with different numbers of dependent measures the Euclidian distances were calculated in the respective P-dimensional space (e.g., two-dimensional space if choices and decision times were included).

11 Note that the selection mechanism is not limited to three dependent measures; the dimensionality of the space can be expanded (reduced) by adding (subtracting) dependent measures. Each pair of strategies must make different predictions on at least one dependent measure to disentangle strategies. Note also that it is possible to weight each dimension differently in order to scale the impact of each measure on the diagnosticity score (see Appendix E and w d p in formula 6, Step 2, Appendix C for details).

12 We checked that the pattern of average identification rates displayed in Figure 2 is not driven by a single strategy (e.g., RAND) or only some of the strategies.

13 The interpretation of p-values in the model is not warranted: the number of participants and therefore the test power can be arbitrarily varied. On the contrary, effect sizes can be interpreted in order to identify the relative importance (i.e., incremental explained variance) of each variable when all other variables are controlled for. Additionally, the sign of the predictors can be interpreted in order to identify the relation between the proposed factors and the dependent variables (i.e., identification and posterior probability).

14 If the odds for correct identification with choices only was .761/.239 = 3.18 (see Table 4), the odds for correct identification with choices and decision time would be 3.18 × 7.39 = 23.50, which translates into an odds ratio of .959/.041. Odds ratios below 1 indicate a reduction of the odds. The magnitude of the effects can be compared by calculating the inverse value for odds ratios below 1 (i.e., 1/odds ratio).

15 Unique variance is determined by the reduction in variance from the full linear model to the model reduced by the respective factor(s).

16 We thank Jonathan Baron for making us aware of this work.

17 Note in Table 5 that (e.g.) EQW and WADDcorr have the same set of contrast weights for decision time predictions (i.e., zeroes). This does not mean that the application of both strategies takes the same time (WADDcorr should take longer due to the additional weighting of cues with validities). The application of both strategies is independent of the type of tasks (i.e., all cues are always investigated); thus, contrast predictions do not differ between types of tasks and are set to 0 (i.e., decision times are supposed to stem from a single distribution for all tasks); see also Glöckner (2009, 2010) and Jekel, Nicklisch, and Glöckner (2010) for details.

18 Otherwise, predicted guessing (i.e., p(A) = .5) is erroneously recoded as predicted choice for option A (i.e., p(A) = 1) if a strategy predicts choices for option B (i.e., p(A) = 0) and guessing only.

19 Software to extract the folder is included in most operating systems or is available as open source software (e.g., 7-zip from http://www.7-zip.org/ for Windows systems).

20 Note that labels need to be put in double quotation marks (i.e., “”) and values are separated by comma in all .csv files.

21 We are happy to collect further strategies programmed by other users to extend the set of strategies implemented in the EDTS function; please send your files to the first author (). We plan to provide future extensions to the EDTS function as a download from a website that will be announced via the JDM-society mailing list.

22 We thank our reviewers for making us aware of these issues.

23 Note that hyphens within arguments (i.e., -) are included only for reasons of limited space in the table.

References

Ayal, S., & Hochman, G. (2009). Ignorance or integration: The cognitive processes underlying choice behavior. Journal of Behavioral Decision Making, 22, 455474.CrossRefGoogle Scholar
Betsch, T., & Glöckner, A. (2010). Intuition in judgment and decision making: Extensive thinking without effort. Psychological Inquiry, 21, 279294.CrossRefGoogle Scholar
Brandstätter, E., Gigerenzer, G., & Hertwig, R. (2006). The priority heuristic: Making choices without trade-offs. Psychological Review, 113, 409432.CrossRefGoogle ScholarPubMed
Brehmer, B. (1994). The psychology of linear judgement models. Acta Psychologica, 87, 137154.CrossRefGoogle Scholar
Bröder, A. (2010). Outcome-based strategy classification. In Glöckner, A. & Witteman, C. L. M. (Eds.), Foundations for tracing intuition: Challenges and methods (pp. 6182). London: Psychology Press & Routledge.Google Scholar
Bröder, A., & Schiffer, S. (2003). Bayesian strategy assessment in multi-attribute decision making. Journal of Behavioral Decision Making, 16, 193213.CrossRefGoogle Scholar
Brunswik, E. (1955). Representative design and the probability theory in a functional psychology. Psychological Review, 62, 193217.CrossRefGoogle Scholar
Busemeyer, J. R., & Johnson, J. G. (2004). Computational models of decision making. In Koehler, D. J. & Harvey, N. (Eds.), Blackwell handbook of judgment and decision making (pp. 133154). Malden, MA: Blackwell Publishing.CrossRefGoogle Scholar
Busemeyer, J. R., & Townsend, J. T. (1993). Decision field theory: A dynamic-cognitive approach to decision making in an uncertain environment. Psychological Review, 100, 432459.CrossRefGoogle Scholar
Dhami, M. K., Hertwig, R., & Hoffrage, U. (2004). The role of representative design in an ecological approach to cognition. Psychological Bulletin, 130, 959988.CrossRefGoogle Scholar
Doherty, M. E., & Kurz, E. M. (1996). Social judgment theory. Thinking & Reasoning, 2, 109140.CrossRefGoogle Scholar
Dougherty, M. R. P., Gettys, C. F., & Ogden, E. E. (1999). MINERVA-DM: A memory processes model for judgments of likelihood. Psychological Review, 106, 180209.CrossRefGoogle Scholar
Erev, I., Roth, A. E., Slonim, R. L., & Barron, G. (2002). Combining a theoretical prediction with experimental evidence to yield a new prediction: An experimental design with a random sample of tasks. Unpublished manuscript. Columbia University and Faculty of Industrial Engineering and Management, Techion, Haifa, Israel.Google Scholar
Fiedler, K. (2010). How to study cognitive decision algorithms: The case of the priority heuristic. Judgment and Decision Making, 5, 2132.CrossRefGoogle Scholar
Gigerenzer, G. (2006). What’s in a sample? A manual for building cognitive theories. In Fiedler, K. & Juslin, P. (Eds.), Information sampling and adaptive cognition (pp. 239260). Cambridge: Cambridge University Press.Google Scholar
Gigerenzer, G., Hoffrage, U., & Kleinbölting, H. (1991). Probabilistic mental models: A Brunswikian theory of confidence. Psychological Review, 98, 506528.CrossRefGoogle ScholarPubMed
Gigerenzer, G., & Todd, P. M. (2000). Précis of simple heuristics that make us smart. Behavioral and Brain Sciences, 23, 727780.Google Scholar
Glöckner, A. (2009). Investigating intuitive and deliberate processes statistically: The multiple-measure maximum likelihood strategy classification method. Judgment and Decision Making, 4, 186199.CrossRefGoogle Scholar
Glöckner, A. (2010). Multiple measure strategy classification: Outcomes, decision times and confidence ratings. In Glöckner, A. & Witteman, C. L. M. (Eds.), Foundations for tracing intuition: Challenges and methods (pp. 83105). London: Psychology Press.Google Scholar
Glöckner, A., & Betsch, T. (2008a). Do people make decisions under risk based on ignorance? An empirical test of the Priority Heuristic against Cumulative Prospect Theory. Organizational Behavior and Human Decision Processes, 107, 7595.CrossRefGoogle Scholar
Glöckner, A., & Betsch, T. (2008b). Modeling option and strategy choices with connectionist networks: Towards an integrative model of automatic and deliberate decision making. Judgment and Decision Making, 3, 215228.CrossRefGoogle Scholar
Glöckner, A., & Betsch, T. (2008c). Multiple-reason decision making based on automatic processing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34, 10551075.Google ScholarPubMed
Glöckner, A., Betsch, T., & Schindler, N. (2010). Coherence shifts in probabilistic inference tasks. Journal of Behavioral Decision Making, 23, 439462.CrossRefGoogle Scholar
Glöckner, A., & Bröder, A. (2011). Processing of recognition information and additional cues: A model-based analysis of choice, confidence, and response time. Judgment and Decision Making, 6, 2342.CrossRefGoogle Scholar
Glöckner, A., & Herbold, A.-K. (2011). An eye-tracking study on information processing in risky decisions: Evidence for compensatory strategies based on automatic processes. Journal of Behavioral Decision Making, 24, 7198.CrossRefGoogle Scholar
Glöckner, A., & Hodges, S. D. (2011). Parallel constraint satisfaction in memory-based decisions. Experimental Psychology, 58, 180195.CrossRefGoogle ScholarPubMed
Glöckner, A., & Witteman, C. L. M. (2010). Beyond dual-process models: A categorization of processes underlying intuitive judgment and decision making. Thinking & Reasoning, 16, 125.CrossRefGoogle Scholar
Hilbig, B. E. (2008). One-reason decision making in risky choice? A closer look at the priority heuristic. Judgment and Decision Making, 3, 457462.CrossRefGoogle Scholar
Hilbig, B. E. (2010). Reconsidering ‘evidence’ for fast and frugal heuristics. Psychonomic Bulletin & Review, 17, 923930.CrossRefGoogle ScholarPubMed
Hilbig, B. E., Erdfelder, E., & Pohl, R. F. (2010). One-reason decision-making unveiled: A measurement model of the recognition heuristic. Journal of Experimental Psychology: Learning, Memory, and Cognition, 36, 123134.Google ScholarPubMed
Hochman, G., Ayal, S., & Glöckner, A. (2010). Physiological arousal in processing recognition information: Ignoring or integrating cognitive cues? Judgment and Decision Making, 5(4), 285299.CrossRefGoogle Scholar
Hoffrage, U., & Hertwig, R. (2006). Which world should be represented in representative design. In Fiedler, K. & Juslin, P. (Eds.), Information sampling and adaptive cognition (pp. 381408). Cambridge: University Press.Google Scholar
Holyoak, K. J., & Simon, D. (1999). Bidirectional reasoning in decision making by constraint satisfaction. Journal of Experimental Psychology: General, 128, 331.CrossRefGoogle Scholar
Horton, R. (1967a). African traditional thought and Western science: Part 1. From tradition to science. Africa, 37, 155187.CrossRefGoogle Scholar
Horton, R. (1967b). African traditional thought and Western science: Part 2. The ‘closed’ and ‘open’ predicaments. Africa, 37, 155187.CrossRefGoogle Scholar
Horton, R. (1993). Patterns of thought in Africa and the West: Essays on magic, religion and science. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Jekel, M., Nicklisch, A., & Glöckner, A. (2010). Implementation of the Multiple-Measure Maximum Likelihood strategy classification method in R: Addendum to Glöckner (2009) and practical guide for application. Judgment and Decision Making, 5, 5463.CrossRefGoogle Scholar
Johnson, E. J., Schulte-Mecklenbeck, M., & Willemsen, M. C. (2008). Process models deserve process data: Comment on Brandstätter, Gigerenzer, and Hertwig (2006). Psychological Review, 115, 263272.CrossRefGoogle ScholarPubMed
Karelaia, N., & Hogarth, R. M. (2008). Determinants of linear judgment: A meta-analysis of lens model studies. Psychological Bulletin, 134, 404426.CrossRefGoogle ScholarPubMed
Krause, E. (1987). Taxicab geometry: An adventure in non-Euclidean geometry. Mineola, N. Y.: Dover Publications.Google Scholar
Lee, M. D., & Cummins, T. D. R. (2004). Evidence accumulation in decision making: Unifying the “take the best” and the “rational” models. Psychonomic Bulletin & Review, 11, 343352.CrossRefGoogle Scholar
Montgomery, H., & Svenson, O. (1989). A think-aloud study of dominance structuring in decision processes. In Montgomery, H. & Svenson, O. (Eds.), Process and structure in human decision making (pp. 135–150). Oxford, England: John Wiley & Sons.Google Scholar
Paradis, E. (2005). R for Beginners. Available at: http://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf.Google Scholar
Payne, J. W., Bettman, J. R., & Johnson, E. J. (1988). Adaptive strategy selection in decision making. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 534552.Google Scholar
Pitt, M. A., & Myung, I. J. (2002). When a good fit can be bad. Trends in Cognitive Sciences, 6, 421425.CrossRefGoogle Scholar
Pitt, M. A., Myung, I. J., & Zhang, S. B. (2002). Toward a method of selecting among computational models of cognition. Psychological Review, 109, 472491.CrossRefGoogle Scholar
R Development Core Team. (2011). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.Google Scholar
Rieskamp, J., & Hoffrage, U. (1999). When do people use simple heuristics, and how can we tell? In Gigerenzer, G. & Todd, P. M. (Eds.), Simple heuristics that make us smart (pp. 141167). New York, NY: Oxford University Press.Google Scholar
Rieskamp, J., & Otto, P. E. (2006). SSL: A theory of how people learn to select strategies. Journal of Experimental Psychology: General, 135, 207236.CrossRefGoogle ScholarPubMed
Roe, R., Busemeyer, J. R., & Townsend, J. (2001). Multiattribute decision field theory: A dynamic, connectionist model of decision making. Psychological Review, 108, 370392.CrossRefGoogle Scholar
Russo, J. E., Johnson, E. J., & Stephens, D. L. (1989). The validity of verbal protocols. Memory & Cognition, 17, 759769.CrossRefGoogle ScholarPubMed
Schulte-Mecklenbeck, M., Kuehberger, A., & Ranyard, R. (2011). A handbook of process tracing methods for decision research: A critical review and user’s guide. New York: Psychology Press.CrossRefGoogle Scholar
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461464.CrossRefGoogle Scholar
Simon, D., Krawczyk, D. C., Bleicher, A., & Holyoak, K. J. (2008). The transience of constructed preferences. Journal of Behavioral Decision Making, 21, 114.CrossRefGoogle Scholar
Thagard, P., & Millgram, E. (1995). Inference to the best plan: A coherence theory of decision. In Ram, A. & Leake, D. B. (Eds.), Goal-driven learning (pp. 439454). Cambridge, MA: MIT Press.CrossRefGoogle Scholar
Thomas, R. P., Dougherty, M. R., Sprenger, A. M., & Harbison, J. I. (2008). Diagnostic hypothesis generation and human judgment. Psychological Review, 115, 155185.CrossRefGoogle ScholarPubMed
Todd, P. M., & Gigerenzer, G. (2000). Précis of simple heuristics that make us smart. Behavioral and Brain Sciences, 23, 727780.CrossRefGoogle ScholarPubMed
Wagenmakers, E. J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14, 779804.CrossRefGoogle Scholar
Figure 0

Figure 1: Prediction for 40 qualified cue patterns generated from five strategies (black = PCS, blue = TTB, red = EQW, green = WADDcorr, purple = RAND) in the rescaled prediction space with the three dependent measures (i.e., choices, decision times, confidence judgments) as coordinate axes. The size of the dots is (logarithmically) related to the number of predictions (i.e., density) at the respective coordinates. The five stars represent the predictions of the strategies for the (exemplary) cue pattern shown in the right side of Figure 1.

Figure 1

Table 1: Description of the strategies used in the simulation.

Figure 2

Table 2: Euclidian Diagnostic Task Selection (EDTS).

Figure 3

Figure 2: Identification rates for each type of task selection averaged across strategies and environments based on a) choices (left), b) choices and decision time (middle), and c) choices, decision time, and confidence (right).The term є refers to the error rate for choices. The middle and right graphs are separated by the effect size for decision time (DT) resp. decision time and confidence (DT & CF), d indicates the (maximum) effect size.

Figure 4

Table 3: Logistic regression predicting successful identification in strategy classification (Model 1) and linear regression predicting posterior probability of the data generating strategy (Model 2).

Figure 5

Table 4: Identification rates for the number of dependent measures and task selection vs. representative sampling.

Figure 6

Table 5: Highly diagnostic cue patterns for three dependent measures in a compensatory environment (validities = [.80 .70 .60 .55]) and predictions for each strategy and dependent measure.

Supplementary material: File

Jekel et al. supplementary material

Jekel et al. supplementary material
Download Jekel et al. supplementary material(File)
File 9 KB