1 Introduction
The identification of individuals’ decision strategies has always challenged behavioral decision research. There are at least three traditional approaches. Structural modeling applies a regression based approach to identify the relation between the distal criterion variable, proximal cues, and peoples’ judgments (e.g., Brehmer, 1994; Brunswik, 1955; Reference Doherty and KurzDoherty & Kurz, 1996; see Karelaia & Hogarth, 2008, for a meta-analysis); process tracing methods, for example, record information search (e.g., Reference Hilbig, Erdfelder and PohlPayne, Bettman, & Johnson, 1988) or use think aloud protocols (e.g., Reference Glöckner and WittemanMontgomery & Svenson, 1989; Reference Russo, Johnson and StephensRusso, Johnson, & Stephens, 1989) to trace the decision process (see Schulte-Mecklenbeck, Kuehberger, & Ranyard, 2011, for a review); whereas comparative model fitting approaches investigate the fit of data and predictions of different models to determine the model or decision strategy employed (e.g., Bröder, 2010; Reference Erev, Roth, Slonim and BarronBröder & Schiffer, 2003; see also Reference Pitt and MyungPitt & Myung, 2002).
Comparative model fitting in particular has gained popularity in recent Judgment and Decision Making (JDM) research. In this paper, we discuss the problem of diagnostic task selection when using this strategy classification method. We suggest the Euclidian Diagnostic Task Selection (EDTS) method as a standardized solution. We report results from a comprehensive model recovery simulation that investigates the effects of different task selection procedures, number of dependent measures and their interaction on the reliability of strategy classification in multiple-cue probabilistic inference tasks.
2 Task selection in strategy classification based on comparative model fitting
The principle of strategy classification based on comparative model fitting (referred to in the following as strategy classification) is comparing a vector of choice data Da consisting of n choices for person a to a set of predictions Pa of a set of strategies S. The strategy that “explains” the data vector best is selected. Strategies in set S have to be sufficiently specified to allow the definition of a complete vector of predictions Pa. Vector Pa can consist of sub-vectors for predictions on different dependent measures. Some strategies have free parameters to capture individual differences. Aspects that have to be considered to achieve a reliable strategy classification are: a) that all relevant strategies are included in the strategy set (e.g., Bröder & Schiffer, 2003), b) that overfitting due to model flexibility is avoided (e.g., Bröder & Schiffer, 2003), c) that appropriate model selection criteria are used (e.g., Hilbig, 2010; Reference Hilbig, Erdfelder and PohlHilbig, Erdfelder, & Pohl, 2010; Reference Pitt and MyungPitt & Myung, 2002; Reference Pitt and MyungPitt, Myung, & Zhang, 2002), and d) that diagnostic tasks are selected that allow differentiating between strategies (e.g., Glöckner & Betsch, 2008a). In the current paper, we investigate the influence of a more or less diagnostic task selection in more detail.
We are particularly interested in the consequences of representative sampling as opposed to diagnostic task selection. Tasks are to a varying degree representative of the environment and/or they are more or less diagnostic with respect to strategy identification (Reference Gigerenzer, Fiedler and JuslinGigerenzer, 2006). Representative sampling means that experimental tasks are sampled based on the probability of them occurring in the environment to which results should be generalized to (Reference FiedlerBrunswik, 1955).Footnote 1 Representative sampling is important with respect to external validity for at least two reasons. First, if one wants to generalize findings on rationality or accuracy of people’s predictive decisions from an experiment to the real world, it is essential to draw a representative and hence generalizable sample.Footnote 2 One could, for instance, not claim that the calibration of a person‘s confidence judgments is bad if this conclusion is based on a set of “trick questions” that in fact are more difficult than they seem and that rarely appear in the real world (Reference BrunswikGigerenzer, Hoffrage, & Kleinbölting, 1991).Footnote 3 A second aspect concerns interactions between task selection and strategy use. If the selection of tasks disadvantages the use of certain strategies (i.e., in contrast to its application in the real world), people are less likely to employ it, which leads to a general underestimation of its frequency of application.Footnote 4
On the contrary, in diagnostic sampling, tasks are selected that differentiate best between strategies, that is, for which the considered strategies make sufficiently different predictions. Diagnostic task selection has not been given sufficient attention in some previous work. For example, the priority heuristic as a non-compensatory model for risky choices (Reference BrehmerBrandstätter, Gigerenzer, & Hertwig, 2006) was introduced based on a comparative model test. In 89 percent of the choice tasks used in the study, the priority heuristic made the same prediction as one of the established models (i.e., cumulative prospect theory with parameters estimated by Reference Dougherty, Gettys and OgdenErev, Roth, Slonim, & Barron, 2002). Subsequent analyses showed that the performance of the heuristic dramatically drops when more tasks are implemented, for which the heuristic and prospect theory make different predictions (Reference Busemeyer, Johnson, Koehler and HarveyGlöckner & Betsch, 2008a). More research showed that conclusions about the heuristic being a reasonable process model for the majority of people were premature (Reference Bröder, Glöckner and WittemanAyal & Hochman, 2009; Reference Hochman, Ayal and GlöcknerFiedler, 2010; Reference Dhami, Hertwig and HoffrageGlöckner & Herbold, 2011; Reference Glöckner and BetschHilbig, 2008; Reference Gigerenzer and ToddJohnson, Schulte-Mecklenbeck, & Willemsen, 2008). To circumvent such problems in future, diagnostic task selection should be given more attention. However, diagnostic task selection becomes a complex problem if multiple strategies and multiple dependent measures are considered simultaneously as described in the next section. Afterwards we suggest and evaluate a standardized method that allows selecting a set of very diagnostic tasks from all possible tasks based on a simple Euclidian distance calculation in a multi-dimensional prediction space.
3 Strategy classification based on multiple measures
Strategy classification methods were commonly based on choices only. However, strategies are often capable of perfectly mimicking each others’ choices. Non-compensatory heuristics, for example, are submodels of the weighted additive strategy with specific restrictions of cue weights. This problem is even more apparent when, in addition, strategies are considered that do not assume deliberate stepwise calculations (Reference Hilbig, Erdfelder and PohlPayne, et al., 1988). Recent findings on automatic processes in decision making (Reference Hochman, Ayal and GlöcknerGlöckner & Betsch, 2008c; Reference Dhami, Hertwig and HoffrageGlöckner & Herbold, 2011) suggest also taking into account cognitive models assuming partially automatic-intuitive processes (Reference Gigerenzer, Hoffrage and KleinböltingGlöckner & Witteman, 2010). Important classes of models are evidence accumulation models (Reference Busemeyer, Johnson, Koehler and HarveyBusemeyer & Johnson, 2004; Reference Bröder and SchifferBusemeyer & Townsend, 1993; Reference Roe, Busemeyer and TownsendRoe, Busemeyer, & Townsend, 2001), multi-trace memory models (Reference Glöckner and BetschDougherty, Gettys, & Ogden, 1999; Reference Thomas, Dougherty, Sprenger and HarbisonThomas, Dougherty, Sprenger, & Harbison, 2008), and parallel constraint satisfaction (PCS) models (Reference Bröder, Glöckner and WittemanBetsch & Glöckner, 2010; Reference Busemeyer and TownsendGlöckner & Betsch, 2008b; Reference ParadisHolyoak & Simon, 1999; Reference Simon, Krawczyk, Bleicher and HolyoakSimon, Krawczyk, Bleicher, & Holyoak, 2008; Reference Thagard, Millgram, Ram and LeakeThagard & Millgram, 1995). As an example, we include a PCS strategy in our simulation.
Based on the idea that multiple measures can improve differentiation, the multiple-measure maximum-likelihood (MM-ML) strategy classification method (Glöckner, 2009, 2010; Reference Hilbig, Erdfelder and PohlJekel, Nicklisch, & Glöckner, 2010) was developed. MM-ML simultaneously takes into account predictions concerning choices, decision time, and confidence. MM-ML defines probability distributions for the data-generating process of multiple dependent measures (e.g., choices, decision times and confidence) and determines the (maximum) likelihood for the data vector Da given the application of each strategy in the set S and multiple further assumptions (for details, see Appendix A).
It was shown that the MM-ML method leads to more reliable strategy classification than the choice based method (Reference Glöckner and BetschGlöckner 2009).Footnote 5 It has, for instance, been successfully applied to detect strategies in probabilistic inference tasks (Reference Gigerenzer, Hoffrage and KleinböltingGlöckner, 2010) and tasks involving recognition information (Reference Glöckner and BröderGlöckner & Bröder, 2011).
4 Simulation
We used a model recovery simulation approach to investigate the effects of task diagnosticity, numbers of dependent measures, and the interaction of the two on the reliability of strategy classification. We thereby simulated data vectors for hypothetical strategy users with varying noise rates and tried to recover their strategies employing the MM-ML method. In accordance with Glöckner (2009), we simulated probabilistic inferences for six different cue patterns (i.e., a specific constellation of cue predictions in the comparison of two options; see Figure 1, right), which are repeated ten times each resulting in a total of 60 tasks per simulated person.Footnote 6 The choice of the cue patterns was manipulated to test our predictions with respect to representative sampling and diagnostic task selection based on a standardized method. In practice, the selection of the most diagnostic cue patterns for a set of strategies is not trivial and to the best of our knowledge no standard procedures are available. We suggest a method to determine the cue patterns that differentiate best between any given set of strategies and test whether the method increases reliability in strategy classification.
4.1 Design
We generated data based on five strategies in probabilistic inference tasks with two options and four binary cues. We varied the validity of the cues in the environment, the degree of noise in the data generating process, the number of dependent measures included in the model classification, and the diagnosticity of cue patterns that were used. As dependent variables, we calculated the proportion of correct classifications—the identification rate—and the posterior probability of the data-generating strategy.Footnote 7 Ties and misclassifications were counted as failed identification. This results in a 5 (data generating strategy) × 3 (environment) × 4 (error rates for choices) × 3 (noise level for decision times and confidence judgments) × 3 (number of dependent measures) × 4 (diagnosticity of tasks) design. For each condition, we simulated 252 participants, resulting in 544,320 data points in total.
4.1.1 Data-generating strategies
For simplicity, we rely on the same data-generating strategies used in previous simulations (Reference Glöckner and BetschGlöckner, 2009) namely: parallel constraint satisfaction (PCS), take-the-best (TTB), equal weight (EQW), weighted additive (WADDcorr), and random (RAND) strategy, which are described in Table 1.
Note. We used PCS with fixed parameters and a quadratic cue transformation function: decay = .10; wo1−o2 = −.20; wc−o = .01 / −.01 [positive vs. negative prediction]; wv = ((v − .50) × 2)2, stability criterion = 10−6; floor = −1; ceiling = 1 (see Glöckner, 2010, for details).
4.1.2 Environments
We used three environments: a typical non-compensatory environment with one cue clearly dominating the others (cue validities = [.90 .63 .60 .57]),Footnote 8 a compensatory environment with high cue dispersion (cue validities = [.80 .70 .60 .55]), and a compensatory environment with low cue dispersion (cue validities = [.80 .77 .74 .71]).
4.1.3 Error rates for choices and noise level for confidence and time
For each simulated participant, a data vector Da was generated, based on the prediction of the respective data-generating strategy plus noise. The vector consisted of a sub-vector for choices, decision times, and confidence. For the choice vector, (exact) error rates were manipulated from 10% to 25% at 5%-intervals. For example, an error rate of 10% leads to 6 out of 60 choices that are inconsistent with the predictions of the strategy. It was randomly determined which six choices were flipped to the alternative choice for each simulated participant.Footnote 9
Normally distributed noise was added to the predictions of the strategies for the decision time and confidence vectors (normalized to a mean of 0 and a range of 1). The three levels of noise on both vectors differed with respect to the standard deviation of the noise distribution σerror = [1.33 1 0.75], which is equivalent to a manipulation of the effect size of d = [0.75 1 1.33]. Note that adding normally distributed noise N(µ = 0, σerror) to a normalized prediction vector leads to a maximum (population) effect size of d = µmax − µmin/σpooled = 1/σpooled . Note also that the term µmax − µmin is the difference between the means of the most distant populations from which realizations of the dependent measures are sampled and which reduces to 1 due to normalizing prediction vectors. The pooled standard deviation of those populations is equal to the standard deviation of the noise distribution (i.e., σpooled = σerror) because random noise is the only source of variance within each population. Thus, a standard deviation of (e.g.) σerror = σpooled = 1.33 leads to a maximum effect size of d = 1/1.33 ≈ 0.75 between the most distant populations of the dependent measures.
4.1.4 Number of dependent measures
The strategy classification using MM-ML was based on varying numbers of dependent measures including (a) choices only, (b) choices and decision times, or (c) choices, decision times and confidence judgments.
4.1.5 Diagnosticity in Cue Patterns
We manipulated the diagnosticity of cue patterns used in strategy classification by using a) the Euclidian Diagnostic Task Selection (EDTS) method that determines the most diagnostic tasks given a set of strategies and the number of dependent measures considered, b) two variants of this method that generate medium and low diagnostic tasks, and c) representative (equal probability) sampling of tasks.
Probabilistic inference tasks with two options and four binary cues (i.e., [+ –]) allow for 240 distinct cue patterns. To prepare task selection, the set was reduced to a qualified set of 40 cue patterns by excluding all option-reversed versions (n = 120) and versions that were equivalent except for the sign of non-discriminating cues (i.e., [– –] vs. [+ +]). Then, strategy predictions for each of the three dependent measures were generated and rescaled to the range of 0 to 1 (for details, see Appendix B). The rescaled prediction weights for each strategy and each qualified task are plotted in the three-dimensional space that is spanned by the three dependent measures (Figure 1).
EDTS (Table 2) is based on the idea of cue patterns being diagnostic if predictions for strategies differ as much as possible. The pairwise diagnosticity is thereby measured as Euclidian distances between the predictions of two strategies for each cue pattern in the three-dimensional prediction space (Figure 1). The main criterion for cue pattern selection is the average diagnosticity of a cue pattern which is the mean of its Euclidian distances across all possible pairwise strategy comparisons in the space (i.e., PCS vs. TTB, PCS vs. EQW, …). For statistical details, see Appendix C, and for a discussion of EDTS-related questions, see Appendix E.
For the high diagnosticity condition, we selected six cue patterns according to the EDTS procedure. For the medium and low diagnosticity condition, we selected cue patterns from the middle and lower part of the by diagnosticity sorted list of cue patterns generated in step 4 of EDTS. Cue patterns were sampled uniformly at random for the representative sampling condition.Footnote 10
4.1.6 EDTS function in R
We have implemented EDTS as an easy-to-use function in the free software package R (2011). You can specify your own environment (i.e., number of cues and validities of cues), generate the set of unique pairwise comparisons between cue patterns for your environment (as described in 4.1.5), derive predictions for all strategies on choices, decision times, and confidence judgments for those tasks (as described in 4.1.1), and apply EDTS to calculate the diagnosticity of each task (as described in 4.1.5); see Appendix D and F for a detailed description of the EDTS function.
By applying the EDTS function, you can find the most diagnostic tasks from a specified environment, set of strategies, and set of measures for future studies. You can also (systematically) alternate the number and validities of cues to find the environment that produces tasks that optimally distinguish between a set of strategies. Finally, you can also use the EDTS function to evaluate the diagnosticity of tasks, thus the reliability of strategy comparisons, and thus the reliability of conclusions from past studies.
4.2 Hypotheses
Based on previous simulations (Reference Glöckner and BetschGlöckner, 2009), we predict that additional dependent measures for MM-ML lead to higher identification rates and posterior probabilities for the data-generating strategy. We further expect that less diagnostic cue patterns lead to lower identification rates and posterior probabilities. We also hypothesize an interaction effect between diagnosticity and the number of dependent measures, that is, less diagnostic cue patterns benefit more from adding further dependent measures. For practical purposes, we are particulary interested in the size of the effect of each manipulation to assess the extent to which common practices influence results.
5 Results
5.1 Identification Rate
The overall identification rates for each type of task selection averaged across all environments and all strategies based on choices only are displayed in Figure 2 (left). As expected, cue patterns with high diagnosticity selected according to EDTS lead to the highest identification followed by representative sampling; cue patterns with medium and low diagnosticity were consistently even worse in identification. (see Figure 2, middle) or close (left and right) in identification. All types of task selection benefit from adding a second (see Figure 2, middle) and a third (see Figure 2, right) dependent measure. Representative sampling and the conditions with low and medium diagnosticity benefit most from adding a third dependent measure.Footnote 12
Hence, results are descriptively in line with our hypotheses. For a statistical test of the hypotheses, we conducted a logistic regression predicting identification (1 = identified, 0 = not identified) by number of dependent measures, diagnosticity of tasks, environment, generating strategy, epsilon rate for choices, effect size for decision times, and confidence judgments (Table 3, first model).Footnote 13
Note. Variables are dummy-coded and compared against the control condition. Variables for which interactions are calculated are centered. Nagelkerke’s R2 = .547 for identification rates; Adj. R2 = .474 for posterior probabilities (N = 544,320, p < .001). p < .001 for all predictors and model comparisons (full vs. reduced models).
Results of the logistic regression indicate changes in the ratio of the odds for a successful strategy identification. For example, the odds ratio for the first dummy variable indicating that two dependent measures were used (i.e., choices and decision times), as compared to choices only (i.e., control group), is 7.39. This implies that the odds for identification increase by the factor 7.39 from using choices alone to using choices and decision times.Footnote 14 Adding decision time and confidence increases the odds ratio for identification by a factor of 20.91 (compared to choices only).
The odds for identification decrease by the factor of 0.29 (i.e., reduction to less than one third; see Footnote 14) when using representative sampling instead of high diagnostic sampling according to EDTS. The reduction from high to medium and low diagnostic sampling is even more pronounced.
Finally, less diagnostic pattern selection mechanisms benefit more from adding further dependent variables, as indicated by the odds ratios for the interaction terms between number of dependent measures and task diagnosticity. In particular, when all three dependent measures are considered, identification dramatically increases for representative sampling as well as medium and low diagnostic tasks, so that the disadvantage of representative sampling decreases to 3% (Table 4).
Note. Averaged over strategies, є rates for choices, effect sizes for decision times and confidence judgments, and environments.
Hence, in line with our hypothesis, we replicate the finding that identification increases with number of dependent measures. High-diagnosticity task-sampling according to EDTS leads to superior identification rates. The disadvantage of representative sampling decreases when more dependent measures are included.
5.2 Posterior probabilities for the data generating strategy
To analyze the effects of our manipulations further, we regressed posterior probabilities on the same factors described above (Table 3, Model 2). As expected, given that identification and posterior probabilities are both calculated from Bayesian Information Criterion (see Appendix A, Equation 2) values, the hypothesized effects of the manipulations are replicated. The independent variables of the linear model explain 47.4% of the variance in posterior probabilities. The number of dependent measures and task diagnosticity explain most of the unique varianceFootnote 15 in posterior probabilities (19.3% and 16.9%). In comparison to classification based on choices only, two and three dependent measures lead to an increase of .199 and .323 in posterior probabilities. In comparison to high diagnostic cue patterns selected according to EDTS, posterior probabilities are reduced by –.260 and –.320 for cue patterns with medium and low diagnosticity, and by –.123 for representative sampling. Thus, cue pattern selection according to EDTS leads to considerably higher posterior probabilities of the data generating strategies than representative sampling.
6 Discussion and conclusion
Individual level strategy classification in judgment and decision-making is a statistical and a methodological challenge. There was a lack of standard solutions to the complex problem of diagnostic task selection in multi-dimensional prediction spaces. In the current paper, we suggest Euclidian diagnostic task selection (EDTS) as a simple method to select highly diagnostic tasks and show that EDTS increases identification dramatically. Furthermore, we replicate the increase in identification rates by employing multiple dependent measures in multiple-measure maximum likelihood (MM-ML) strategy classification method (Glöckner, 2009, 2010). We find that, under the conditions considered in our simulation, representative task-sampling reduces the odds for successful strategy classification by more than factor 1/3 compared to EDTS. This disadvantage, however, reduces if multiple dependent measures are used. Hence, if representative sampling is advisable for other methodological reasons (see section 2), multiple measures should be used. Unfortunately, this is not possible for all models because many models predict choices only (i.e., paramorphic models of decision making).
Our findings highlight that the issue of diagnosticity in task selection in comparative model fitting should be taken very seriously. To avoid ad-hoc criteria, we suggest using the EDTS method introduced in this article. Furthermore it would be advisable to report average diagnosticity scores for each selected cue pattern to be able to evaluate results better.
Robin Horton (1967a, 1967b, 1993)Footnote 16, who investigated the differences between religious and scientific thinking within the framework of Popper’s critical rationalism, stated (1967b, p. 172) that “[f]or the essence of experiment is that the holder of a pet theory does not just wait for events to come along and show whether or not it has a good predictive performance.”—an approach that might be equated with representative sampling—“He bombards it with artificially produced events in such a way that its merits or defects will show up as immediately and as clearly as possible.” We hope that EDTS may help to find those events in a more systematic fashion in future research.
Appendices
Appendix A: The Multiple-Measure Maximum Likelihood strategy classification method (MM-ML)
Appendix A describes the basic math of the MM-ML method; see Glöckner (2009, 2010) and Jekel, Nicklisch, and Glöckner (2010) for a more thorough description of the method, tools, and tutorials on how to apply MM-ML.
To apply MM-ML in probabilistic decision making, it is necessary to select a set of strategies, a set of dependent measures, and a set of cue patterns. For each dependent measure, assumptions have to be made concerning the probability function of the data-generating process. In our simulation study, we use choices, decision times, and confidence judgments as dependent measures and assume choices for six cue patterns which are repeated ten times each. The number of choices in line with a strategy prediction is assumed to be binomially distributed with a constant error rate for each cue patter; (log transformed and order corrected) decision times and confidence judgments are assumed to be drawn from normal distributions around rescaled prediction weights with constant standard deviation per measure.
Given a contrast weight t T i for the decision time and t C i for the confidence judgment of task i, further observing a data vector D consisting of a subvector for choices with n j k being the number of choices of type of tasks j congruent to strategy k and consisting of subvectors for decision time x T i and confidence judgment x C i for task i, it is possible to calculate the likelihood L total for the observed data vector under the assumption of an application of strategy k (and the supplementary assumptions mentioned above) for a participant according to (Glöckner, 2009, Equation 8, p. 191):
The error rate for choices, єk, the overall mean and standard deviation for decision times (µT, σT) and confidence judgments (µC, σC) as well as the rescaling factor R T and R C (R T, R C ≥ 0) for decision times and confidence judgments that minimize the log-likelihood function are estimated.
The Bayesian Information Criterion (BIC, Schwarz, 1978) is calculated to account for different numbers of parameter (numbers vary because some strategies do not predict differences on all dependent measures or assume a fixed error rate of .50) according to:
N obs represents the number of task types (i.e., six in the simulations) and N p the number of parameters that need to be estimated for the likelihood. Thus, a strategy with more free variables is punished for its flexibility.
Finally, the posterior probability Pr for a specific strategy k, i.e., the probability of the strategy k as the data-generating mechanism under consideration of the observed data D and under the assumption of equal prior probabilities for all (i.e., K) considered strategies, can be calculated based on the BIC values according to (compare with Wagenmakers, 2007, Equation 11, p. 797):
Appendix B: Strategy predictions
Predictions of strategies are derived by assuming that TTB, EQW, and WADDcorr are applied in a stepwise manner according to the classic elementary information processes approach (e.g., Payne, et al., 1988). For PCS, predictions are derived from a standard network simulation (Reference Busemeyer and TownsendGlöckner & Betsch, 2008b; Reference HortonGlöckner, Betsch, & Schindler, 2010; Reference Glöckner and BröderGlöckner & Bröder, 2011; Reference HortonGlöckner & Hodges, 2011) using the parameters mentioned in the Note of Table 1. Table 5 shows the predictions for the cue patterns selected for the high diagnosticity condition in the environment with cue validities of .80, .70, .60, and .55 as an example.
Choices. Choice predictions are determined according to the mechanisms described in Table 1.
Note. Positive cue values are indicated by +, negative cue values by −. A:B represents guessing between options.
Decision times. For TTB, EQW, and WADDcorr, the number of computational steps necessary to apply the strategy is used as time prediction. For PCS, the number of iterations of the network necessary to find a stable solution is used as an indicator for decision time.Footnote 17
Confidence judgments. For TTB, the validity of the discriminating cues is used as a predictor to confidence (Reference BrunswikGigerenzer, et al., 1991). For EQW and WADDcorr, the difference in the (un)weighted sum of cue values for each option is used instead. For PCS, the difference in activations of the options is used as a predictor for confidence judgments.
Appendix C: Euclidian Diagnostic Task Selection (EDTS)
Step 1: Generate standardized prediction vectors
Define a set of K strategies s to be tested, a set of P dependent measures d used for MM-ML, and a set of I qualified cue patterns c (i.e., excluding identical patterns). Calculate prediction vectors for each strategy (see Appendix B) and rescale them to a range of 0 to 1 per strategy. Note that dependent measures of probabilities (e.g., choices) should not be rescaled.Footnote 18 The goal is to choose n cue patterns highest in diagnosticity from the set of the I cue patterns. Assume the following notation for raw (indicated by superscript R) contrast weights cw:
Each contrast vector is calculated from the raw values to fit the range from 0 to 1. Contrast weights are rescaled by:
Step 2: Calculate diagnosticity scores for strategy comparisons
Compute the diagnosticity scores for each task as the Euclidian distances ED for each strategy comparison and each cue pattern within the space spanned by the vectors of the P dependent measures that are weighted by w d p. Following this, standardize these distances to a range from 0 to 1. ED between strategy k and o (k ≠ o) for cue pattern i are calculated by:
For each comparison of strategy k and o, rescale ED s ks oR across all I cue patterns to fit the range from 0 to 1 by:
[Rationale for rescaling: Euclidian distances for each strategy comparison should have the same range to avoid overweighting (resp. underweighting) of strategy comparisons with a high variance (resp. low variance) in Euclidian distances.]
Step 3: Calculate the average diagnosticity scores
Calculate the means for each row of the matrix containing the rescaled Euclidian distances ED c i to receive the average diagnosticity AD score for each cue pattern by:
Step 4: Sort cue patters by average diagnosticity scores and select cue patterns
The set of I cue patterns can be easily sorted by their AD score. The n cue patterns with the highest AD score would be selected.
Step 5: Refine selection
Investigate if the maximum of diagnosticity scores for each strategy comparison is above a threshold t min. To find an appropriate set of cue patterns, the threshold should increase with the number of dependent measures used and decrease with the number of pairwise comparisons. In the simulations, we used a threshold value of t min=.75. If a maximum is below the aspired threshold, replace the last cue pattern(s) by one of the following cue patterns until the threshold is reached for all comparisons. If no such cue pattern is found, repeat the procedure with a lower threshold.
[Rationale: A high mean of rescaled Euclidian distances for a cue pattern can be produced by a single high distance for one of the strategy comparisons. Apply step 5 to ensure that there is at least one diagnostic cue pattern for each strategy comparison in the subset (as defined by the threshold).]
Appendix D: Implementation of EDTS as a function in R
EDTS is implemented as an easy-to-use function in R. R (2011) is a software for statistical analysis under the GNU general public license, e.g., it is free of any charge. R is available for Windows, Mac, and UNIX systems. To download R, visit the Comprehensive R Archive Network (http://cran.r-project.org/). To learn more about R, we propose the free introduction to R by Paradis (2005); however, to apply EDTS in R, no sophisticated prior experience with the R syntax is required.
You can download the EDTS.zip folderFootnote 19 from http://journal.sjdm.org/vol6.8.html, listed with this article. In the folder EDTS, there are two files—mainFunction.r and taskGenerator.r—and an additional folder strategies containing six further R files. In the current version of the EDTS function, it is possible to generate all possible unique pattern comparisons for two-alternative decision tasks with binary cue values (i.e., 1 or –1), to derive predictions for all tasks and a set of default strategies, and to calculate the diagnosticity index for each task as proposed in the article.
To use the EDTS function, you need to copy and paste (or submit) the code provided in the file mainFunction.r, i.e., you can open mainFunction.r in a standard text editor, copy the entire code, and paste the code in the open R console. To call the function afterwards, type the command:
EDTS (setWorkingDirectory, validities,
measures, rescaleMeasures,
weightingMeasures, strategies, generateTasks,
derivePredictions, reduceSetOfTasks,
printStatus, saveFiles, setOfTasks,
distanceMetric, PCSdecay, PCSfloor,
PCSceiling, PCSstability, PCSsubtrahendResc,
PCSfactorResc, PCSexponentResc)
in the open R console and hit Enter. If an argument of the function is left blank, the default is applied. Arguments, descriptions, valid values, examples and defaults are listed in Appendix F. In the following, we give an example for illustrative purposes.
Example
Assume you want to test which of (e.g.) the four strategies—PCS, TTB, EQW, or RAND—describes human decision making best in a six-cue environment with the cue validities v = [.90 .85 .78 .75 .70 .60] (compare with Reference Simon, Krawczyk, Bleicher and HolyoakRieskamp & Otto, 2006). Your goal is to select the most diagnostic tasks from all possible tasks for an optimal comparison of strategies. Assume further that you will assess choices and decision times as dependent variables in your study; thus, you only need to rescale decision times (see Appendix C). For all the remaining arguments, you want to keep the defaults of the function.
To apply EDTS, you put the unzipped EDTS folder under C:\, open the file mainFunction.r with a text editor, copy the entire text and paste it in the open R console. Following, you type in:
EDTS(validities = c(.90, .85, .78, .75, .70, .60),
measures = c(“choice”, “time”),
rescaleMeasures = c(0, 1), strategies =
c(“PCS”, “TTB”, “EQW”, “RAND”))
and hit Enter. Three .csv files are created: (1) tasks.csv includes all qualified patterns for a pairwise comparisons with six cues (i.e., 364 tasks), (2) predictions.csv includes choice and decision time predictions for all strategies (i.e., PCS, TTB, EQW and RAND) and all tasks listed in tasks.csv, (3) outputEDTS.csv includes the average diagnosticity score (AD), the minimum, maximum, and median diagnosticity of all strategy comparisons. Additionally, “raw” diagnosticity scores for each task and each strategy are provided. Based on the AD scores, you finally select the most diagnostic tasks for the strategy comparisons in a six-cue environment (see Table 2, step 5).
Generalizations
We added two further strategies as default strategies: (1) WADDuncorr (Reference Rieskamp, Hoffrage, Gigerenzer and ToddRieskamp & Hoffrage, 1999) has been extensively used in past studies and thus can serve as an interesting competitor. WADDuncorr is identical to WADDcorr but does not correct validities for chance level (e.g., .5 for pairwise comparisons). (2) RAT (Reference Glöckner and WittemanLee & Cummins, 2004) is the rational choice model based on Bayesian inference. It has been included as a further strategy in order to allow comparisons between heuristic models and the rational solution in probabilistic decision making.
Additionally, it is also possible to extend the set of default strategies with your own strategies. To do so, you open the file predictions.csv and include a prediction column for each measure and for each task for your own strategies (as defined in tasks.csv). The labels of the new columns need to fit the form NameOfYourStrategy.Measure. Additionally, the order and number of columns (i.e., the order of predictions for each measure) need to follow the order of the measures of the other strategies included (i.e., choice, time, and confidence for the default measures).Footnote 20 To apply EDTS for your own specified set of strategies, you then include the names of your strategies in the argument strategies of the EDTS function and set the argument derivePredictions = 0 (i.e., predictions are not derived and the data matrix defined in predictions.csv with your set of strategies is loaded into the program instead).
It is also possible to add further dependent measures. Similar to adding strategies, you insert a further column for each strategy following the form Strategy.Measure for the labeling in the first row of the data matrix. For example, if you want to compare PCS and TTB on choices, decision times, and (e.g.) arousal (Reference Hochman, Ayal and GlöcknerHochman, Ayal & Glöckner, 2010), the file predictions.csv consists of 7 columns. In the first column, the number of the task is coded. From the second to the third column, PCS predictions are inserted with the labels PCS.choice, PCS.time, PCS.arousal in the first row of the data matrix. From the fifth to seventh column, TTB predictions are inserted with the labels TTB.choice, TTB.time, TTB.arousal. Thus, predictions for each measure are inserted by strategy and for each strategy the measures are in the same order.
In general, the EDTS function is thus applicable to any strategy for which quantitative predictions on each measure can be derived for a set of tasks. The function can also be applied to tasks differing from the default characteristics (e.g., probabilistic decision-making between three options and/or continuous cues) or from the default type (e.g., preference decisions between gambles) by inserting the predictions for each strategy and measure in the file predictions.csv as described above. Thus, the method is not limited to the strategies and tasks used and implemented as defaults in the EDTS function. The experienced R user can thus implement her strategies as R code. To simplify coding, the main EDTS function and strategies are coded in separate files (see folder strategies), and strategies are also coded as functions that are similar in structure (same input variables, etc.).Footnote 21
Appendix E: Open questions and future research
This short Appendix is supposed to make you aware of some open questions. For those researchers who are interested in applying EDTS, this section may sensitize you to critical aspects of EDTS. For those researchers who are interested in optimizing EDTS, the following open questions can be a hint for future studies; the EDTS function provided (see Appendix D and F) may further facilitate this process.Footnote 22
There are alternative selection criteria (e.g., maximum or median) that may be used for task selection instead of the mean proposed and validated in the current study. For example, strategy comparisons may be more effective if the most discriminating task for each comparison (= maximum) is selected. However, there are two opposing forces at work: the number of tasks increases rapidly if the set of strategies increases (i.e., 5 strategies = 10 tasks, 6 strategies = 15 tasks, 7 strategies = 21 tasks, etc.). This can lead to less repetition of the selected tasks if the number of tasks that can be presented in a study is limited. Less repetition can then lead to a less reliable strategy classification dependent on the error rate. It is therefore an open question if the gain of diagnosticity for single comparisons outweighs the loss of reliability due to less repetition of the tasks. To facilitate comparison between several diagnosticity statistics, the output of the EDTS function includes several diagnosticity statistics (mean, median, maximum, and minimum) and the “raw” diagnosticity scores for each strategy comparison and each task.
There is no need to restrict EDTS to Euclidian distances as the metric for diagnosticity scores. It is an open question if other metrics lead to reliable strategy classification as well (or even better ones). We have implemented the option to calculate diagnosticity scores based on Taxicab/Cityblock metrics (Reference KrauseKrause, 1987) in the EDTS function as well.
Finally, there may be reasons to weight the impact of each dependent measure on the diagnosticity score differently. For example, it may be reasonable to reduce the impact of dependent measures that are less reliable and thus favor more reliable measures in diagnostic task selection. It is an open question if different weighting schemes (e.g., weighting of each measure relative to a reliability index) lead to higher identification rates. We have implemented the option to weight measures differently in the EDTS function.
Appendix F: EDTS() function in R
Arguments, descriptions, valid values, examples, and defaults.