1 Introduction
The study of human decision-making has long been a cornerstone of economics, but accurately measuring preferences has proven to be a complex challenge. One method gaining popularity in recent years is the strategy method (SM), which involves asking participants to indicate their choices at all information sets, enabling researchers to compare decisions at different points in a given scenario. However, while SM offers several advantages over traditional direct elicitation (DE), it also has its limitations. A fundamental distinction between the two methods is that the information sets for decision nodes differ, which can lead to different inferences. Nonetheless, SM remains a powerful tool for economists seeking to deepen their understanding of human behavior and the forces that shape it.
SM consists of asking participants to indicate their choices at all information sets rather than only those actually reached. One then compares the differences in decisions at different information sets. For example, to identify the effect of a low offer in an ultimatum game, one might compare the changes in decisions for the low-offer information set with the decisions for the high-offer information set. The appeal of SM comes from its simplicity as well as its potential to elucidate the equilibria that are actually played when theoretical models indicate there are multiple equilibria. SM also has the potential to circumvent many of the endogeneity problems that arise in estimating preferences when making comparisons between heterogeneous individuals.
Extant empirical research tends to rely on the behavioral validity of SM (Fischbacher et al., Reference Fischbacher, Gächter and Quercia2012). Brandts and Charness (Reference Brandts and Charness2011, p. 376) write that, “according to the standard game-theoretic view, the strategy method should yield the same decisions as the procedure involving only observed actions” and provide empirical evidence against the claims in the literature. Chen and Schonger (Reference Chen and Schonger2023) summarizes the theoretical views and presents a theorem (Moulin, Reference Moulin1986, pp. 84–86) arguing that SM is subject to a possibly severe economic-theoretical bias.
As evidence for the relevance of the theorem, we briefly revisit prior meta-analyses and conduct our own meta-analysis of ultimatum game experiments in the Online Appendix. We choose the ultimatum game because it is simple and one of the most employed games in experiments. But since the previous literature has highlighted that complexity is an important factor (Brandts & Charness, Reference Brandts and Charness2011), we also consider the three-player prisoners’ dilemma.
In the meta-analysis, acceptance rates are 20 percentage points higher in the DE setting than in the SM setting. In the remaining analyses, we conduct our own experiments. First, we randomize whether the respondent, but not the poposer, is in SM or DE to ensure the proposal is the same in both treatments. The DE setting increases acceptances and is equivalent to an offer increase of 34% of endowment. Subsequent experiments allow the proposer to also know if the responder is in the DE or SM setting. Next, we manipulate the salience of off-equilibrium motivations. DE increases acceptance rates in the ultimatum game by 18 percentage points. When off-equilibrium motivations are made salient, the difference increases to 27 percentage points. In total, we report the results of five analyses that all demonstrate the relevance of the theorem. As already mentioned, we do so in the context of simple games, like the ultimatum game and trust game, as well as more complex games, like the three-player prisoners’ dilemma. In the trust game, DE respondents return three times the amount SM respondents return. In the three-player prisoners’ dilemma, DE affects deductions of defectors.
The last two of our five analyses highlight how treatment effects can significantly differ between SM and DE, while also flipping in sign. When we interpret salience as the treatment effect of interest, we see evidence that salience has a weakly positive treatment effect under DE but is negative under SM. The difference in treatment effects is statistically significant at the 5% or 10% level.
The remainder of the paper is outlined as follows: Section 2 presents an experiment where the ultimatum game respondent is randomized to DE or SM. The appendix presents the experiment where DE versus SM differences extend to the trust game. Section 3 presents an experiment where the ultimatum game is randomized to DE or SM and where off-equilibrium considerations are randomly made salient. Section 4 presents a complex game, the three-player prisoners’ dilemma. Section 5 concludes.
2 Ultimatum game: DE versus SM for respondent
2.1 Design
This study used MTurk. We first asked MTurk subjects to transcribe three paragraphs of textFootnote 1 to reduce the likelihood of their dropping from the study after seeing treatment—a technique to minimize differential attrition that may affect causal inference when using MTurk subjects (Chen & Reinhart, Reference Chen and Reinhart2022; Chen & Horton, Reference Chen and Horton2016; Chen & Yeh, Reference Chen and Yeh2010; Chen et al., Reference Chen, Yeh, Levonyan and Yeh2017).Footnote 2 After the lock-in task, subjects have an opportunity to split with the recipient a 50 cent bonus (separate from the payment they received for data entry), up to 23 times the expected wage.Footnote 3 We had 156 subjects split evenly between the role of proposer and respondent and between SM and DE (2 × 2 design). Instructions are in Online Appendix B.
In the ultimatum game (Fig. B.1), the proposer offers a split of $0.50 between herself and the responder, in increments of $0.05. In the DE treatment, the responder was informed about the amount offered and asked whether she accepts or rejects the offer (Fig. B.2). If accepted, both players received the payoff according to the split proposed by the proposer. If rejected, both players received zero payoff. In the SM treatment, the responder indicated whether she would accept or reject each possible offer without knowing the actual offer. If the responder rejected the offer actually made by the proposer, neither player received any bonus. The responder’s behavior can be characterized by a rejection threshold, the minimum offer the responder is willing to accept (Fig. B.3). The proposer did not know the method of elicitation for the responder in order to hold proposer’s decisions constant. We are interested in the average treatment effect of DE versus SM on the responder.
2.2 Results
Table 1 regresses an indicator for whether or not the ultimatum game offer was accepted on the treatment indicator, SM, using a linear probability model. Results are robust to using a probit specification. While there were 20 percentage points fewer acceptances in SM (p < 0.1) (Column 1), the effect becomes 22 percentage points and more significant (p < 0.05) when controlling for the amount offered (Column 2).Footnote 4 For each additional $0.01 offered, the acceptance rate increases by 2 percentage points (p < 0.001). In terms of magnitude, DE is equivalent to an additional 17 cents offer in a 0–50 ultimatum game, or roughly 34% of endowment. Including an interaction between offer and SM yields a significantly greater association of 1.7 percentage points acceptance rate per $0.01 offer amount (p < 0.1) (Column 3), which is analogous to what was found in the survey of prior literature in Online Appendix A.
(1) |
(2) |
(3) |
|
---|---|---|---|
(Intercept) |
0.917*** |
0.543*** |
0.784*** |
(0.0467) |
(0.126) |
(0.214) |
|
Strategy method |
− 0.202* |
− 0.223** |
− 0.629* |
(0.0846) |
(0.0817) |
(0.268) |
|
Offer level |
0.0155*** |
0.00552 |
|
(0.00453) |
(0.00814) |
||
Strategy × offer level |
0.0165+ |
||
(0.00960) |
|||
Mean of Y |
0.808 |
0.808 |
0.808 |
N |
78 |
78 |
78 |
This table examines the determinants of whether the ultimatum game offer is accepted by the second player. Column (1) shows the raw correlation between acceptance and the treatment indicator (SM decision-making). Column (2) also controls for amount offered by the first player. Column (3) examines whether treatment affects the relationship between acceptance and amount offered
Robust standard errors in parentheses
+ p < 0.1, *p < 0.05, **p < 0.01, ***p < 0.001
3 Ultimatum game: DE versus SM and low versus high salience
3.1 Design
We chose to run our remaining studies in the lab, which may be a more controlled setting than MTurk. In this study, we ran the lab experiment at the MaXLab following their standard procedures in Magdeburg and using oTree (Chen et al., Reference Chen, Yeh, Schonger and Wickens2016). We collected data on 418 subjects across 16 experimental sessions (instructions are in Online Appendix B). In this study, the proposer knows the method of elicitation for the responder, so we examine and control for the offer. The endowment was €1.00. Roughly 70 participants were in each of six treatments (3 × 2 design), listed as follows with abbreviations in parentheses: Direct elicitation (DE)/strategy method (SM)/threshold method (SM-Th) × neutral (neu)/emotional (emo).
We introduce two variants of SM. In one variant, subjects report the threshold (where the responder had to state the minimum level of the offer that she would accept), and in another, they report their strategy (where the responder had to decide whether she would accept every theoretical offer that could be made by the proposer before the actual offer was revealed).Footnote 5 We also introduced a cross-cutting treatment to increase the salience of off-equilibrium payoffs (for a total of six possible groups, two emotional settings × three game variants). In the high salience treatment, the experiment changed two words: proposer → dictator and respondent → subject. The intervention involves only these two words to heighten emotional salience with terms like dictator and subject. If SM versus DE invariance is affected by a few words, the basis for using SM instead of DE would seem fragile. The intervention involves only these two words to heighten emotional salience with terms like dictator and subject.
3.2 Results
We cannot reject the null that the proposer’s offer is the same across treatments (see Fig. 1). Offers are slightly lower in DE than in SM, which is consistent with proposers being aware that responders are more likely to accept in DE. In Oosterbeek et al. (Reference Oosterbeek, Sloof and Van De Kuilen2004)’s meta-analysis of 66 studies, offered shares were significantly lower with DE by 2% (p < 0.1).
Figure 2 reports the natural pattern in ultimatum games: Acceptances are positively associated with the offered amount regardless of treatment. In Column 1, DE shows one observation per subject-pair. In Columns 2 and 3, SM and SM-Th show all possible observations per subject-pair. For the threshold method, we generate an acceptance or rejection for every possible offer. The display is intentionally saturated to illustrate the standard data analysis with SM.
Figure 3 shows that DE results in more acceptances, similar to the survey of prior literature and to our other experiment. In particular, the increase in acceptance is visible in both the low salience (neu) and high salience settings (emo). Increases in acceptance rates under DE are somewhat larger in the high salience setting, which suggests that salience of off-equilibrium considerations may drive some of the differences between DE and SM.Footnote 6 Notably, equilibrium behavior does not diverge between SM and SM-Th methods.
We next examine these relationships in regression analysis. We create indicator variables for every treatment and their interaction (Table 2, Column 1). We include a control for offer level in Column 2 and interactions of offer level and treatment indicators in Column 3.
(1) |
(2) |
(3) |
|
---|---|---|---|
(Intercept) |
0.824*** |
0.248*** |
− 0.488+ |
(0.0463) |
(0.0406) |
(0.284) |
|
Strategy method |
− 0.0963* |
− 0.162*** |
0.625* |
(0.0490) |
(0.0410) |
(0.286) |
|
Threshold method |
− 0.0434+ |
− 0.0434** |
− 0.0943* |
(0.0235) |
(0.0165) |
(0.0399) |
|
Emotions |
0.0684 |
0.0659 |
0.355 |
(0.0587) |
(0.0498) |
(0.325) |
|
Strategy × emotions |
− 0.0928 |
− 0.0903+ |
− 0.427 |
(0.0632) |
(0.0523) |
(0.327) |
|
Threshold × emotions |
0.00291 |
0.00291 |
0.00789 |
(0.0336) |
(0.0229) |
(0.0545) |
|
Offer level |
0.107*** |
0.244*** |
|
(0.00159) |
(0.0488) |
||
Strategy × offer level |
− 0.145** |
||
(0.0489) |
|||
Threshold × offer level |
0.00848+ |
||
(0.00465) |
|||
Emotions × offer level |
− 0.0541 |
||
(0.0547) |
|||
Strategy × emo × offer |
0.0620 |
||
(0.0549) |
|||
Threshold × emo × offer |
− 0.000830 |
||
(0.00634) |
|||
Mean of Y |
0.702 |
0.702 |
0.702 |
N |
3156 |
3156 |
3156 |
This table reports regression results for acceptance rate. SM-Th is treated as a subset of SM (i.e., the strategy dummy is set to 1 also for threshold method observations)
Robust standard errors in parentheses
+ p < 0.1, *p < 0.05, **p < 0.01, ***p < 0.001
We begin with a large sample size for illustrative purposes, but later restrict to one observation per subject-pair. The fact that the proposer makes slightly lower offers in DE means that restricting to one outcome would lead to the erroneous conclusion of higher acceptances in SM.Footnote 7 Indeed, comparing Columns 1 and 2 shows that the difference between SM and DE almost doubles from 9.6 percentage points higher acceptance rate in DE (p < 0.05) to 16.2 percentage points (p < 0.001) once the offer level is controlled for. This doubling did not occur in the experiment reported in Section B when the offer was added as a control, as the offerer in this experiment was unaware of the respondent’s method of elicitation. Note that the high salience treatment further increases the difference in acceptance rates by 9 percentage points (p < 0.1) (Column 2). Here, we see that the “Emotions” treatment has significant interaction with SM rather than with SM-Th. If we interpret salience as the treatment effect of interest, we see evidence that salience has no significant treatment effect but is weakly positive under DE but appears weakly negative under SM, and the difference in treatment effects is statistically significant at the 10% level.
Since SM and SM-Th both involve off-equilibrium considerations and render similar results,Footnote 8 we pool these treatments in Table 3. Columns 1 and 2 confirm the lower acceptance rate in SM of 12 percentage points (p < 0.05) and 18 percentage points (p < 0.001) respectively. When we control for offer level (Column 2), this difference is highly significant. In Column 3, fully interacting offer with the treatments shows that while 1% of offer is associated with 24 percentage points higher acceptance rates (p < 0.001), SM reduces this association by 14 percentage points (p < 0.01) in the low salience setting. This interaction differs from the previous experiment and literature. The main result remains that behavior in DE and SM diverges rather than stays invariant.
(1) |
(2) |
(3) |
|
---|---|---|---|
(Intercept) |
0.824*** |
0.248*** |
− 0.488 |
(0.0463) |
(0.0406) |
(0.284) |
|
Strategy method |
− 0.117* |
− 0.184*** |
0.579* |
(0.0477) |
(0.0402) |
(0.285) |
|
Emotions |
0.0684 |
0.0659 |
0.355 |
(0.0587) |
(0.0497) |
(0.325) |
|
Strategy × emotions |
− 0.0923 |
− 0.0898 |
− 0.425 |
(0.0610) |
(0.0510) |
(0.326) |
|
Offer level |
0.107*** |
0.244*** |
|
(0.00159) |
(0.0488) |
||
Strategy × offer level |
− 0.141** |
||
(0.0488) |
|||
Emotions × offer level |
− 0.0541 |
||
(0.0547) |
|||
Strategy × emo × offer |
0.0618 |
||
(0.0548) |
|||
Mean of Y |
0.702 |
0.702 |
0.702 |
N |
3156 |
3156 |
3156 |
This table reports regression results for acceptance rate. SM and SM-Th are pooled together
Robust standard errors in parentheses
+ p < 0.1, *p < 0.05, **p < 0.01, ***p < 0.001
We can visualize the different correspondence between acceptance rates and offer level for DE and SM in Fig. 4. DE responders are more than twice as sensitive to offers (the regression line for the raw data is red) than SM responders. This is true for both the low and high salience settings.
In sum, DE responders are 18 percentage points more likely to accept than SM responders in the low salience setting and are 27 percentage points more likely to accept in the high salience setting (Table 3, Column 2). Column 3 echoes Figure 4 as the coefficient on the interaction term of Strategy and Offer level suggests that differences between DE and SM responders grows with the offer level.
4 Three-player prisoners’ dilemma: DE versus SM and low versus high salience
4.1 Design
In this study, we ran the lab experiment at the WiSo-ExperimentallaborFootnote 9 following their standard procedures in Hamburg and used oTree (Chen et al., Reference Chen, Yeh, Schonger and Wickens2016). We collected data from 585 participants across 24 sessions. Subjects play the three-player prisoners’ dilemma. We again implement a cross-cutting randomization of high versus low salience for a total of four treatments (SM vs. DE × emo vs. neu). As in the previous study, we designed the salience treatment to avoid framing effects. To manipulate salience, the experiment changed one word (group → team), and changed the background color (purple → red), when describing the game. The setting with group and purple is coded as Emotions = 0 and the setting with team and red is coded as Emotions = 1 in the data analysis. In color psychology, red tends to lead to feelings of excitement, while purple tends to calm (Valdez & Mehrabian, Reference Valdez and Mehrabian1994; Elliot & Maier, Reference Elliot and Maier2014). A team is typically perceived as a group with a common purpose. Again, if invariance between SM and DE is affected by a few words or background color, the basis for using SM instead of DE would seem fragile. Instructions are in the Online Appendix B.
Participants are assigned to matches with three players each. In brief, as in the ultimatum game, DE responders were more cooperative than SM responders. They were less willing to punish non-cooperative first-stage behavior. Differences between DE and SM were affected by salience. We find similar results when we control for the first-stage outcome, restrict the sample to specific first-stage outcomes, or restrict to one observation per subject-first stage outcome. The complete experiment and results are reported in Online Appendix A.
5 Conclusion
Our study suggests that conventional SM estimates may be biased, leading to misleading treatment effects relative to DE. If DE is the gold standard for causal estimates, one possible solution for experiment methods is to collect pilot data that first tests whether SM and DE diverge before collecting additional data using SM. We leave empirical exploration of positive and negative bias for future work.
Supplementary Information
The online version contains supplementary material available at https://doi.org/10.1007/s40881-023-00146-2.
Acknowledgements
Daniel L. Chen acknowledges IAST funding from the French National Research Agency (ANR) under the Investments for the Future (Investissements d’Avenir) program, Grant ANR-17-EUR-0010. This research has benefited from financial support of the research foundation TSE-Partnership and ANITI funding, and of Alfred P. Sloan Foundation (Grant No. 2018-11245), European Research Council (Grant No. 614708), Swiss National Science Foundation (Grant Nos. 100018-152678 and 106014-150820), and Templeton Foundation (Grant No. 22420).
Data availability
Data is available on request.