1 Introduction
When COVID-19 disrupted laboratory experiments around the world, many researchers turned to online experiments. While the high level of control provided by the laboratory is considered the gold standard for conducting experiments with human subjects, circumstances often lead to relaxing that control. Now that the pandemic is less of a threat scholars have the luxury of choice between engaging subjects online or returning to the lab. Subjects, too, have the choice to participate in laboratory experiments or opt only for experiments carried out online. Does it matter for experimental results if subjects self-select into their venue?
When measuring social preferences, this question is especially important. The lab may attract more socially motivated subjects. In addition, the lab facilitates closer monitoring of subjects and the lab diminishes potential doubts about the existence of counterparts participating in the experiment. This may lead to a finding of more generous social preferences in the lab when compared to those participating online. On the other hand, those who participate online do not have to make a significant time commitment by going to a lab. Those subjects might feel less deserving of compensation and translate their time savings into more generous behavior. As researchers, how much should we worry about the impact on our results of subjects who choose to participate in lab or in online settings?
In this study we invited all subjects who had participated in the first wave of a longitudinal panel to participate in a study measuring social preferences in our lab, located on campus. Those who signed up (the eager beavers) completed the task in a traditional laboratory setting. However, many of our participants had not signed up for this wave of the study. After the laboratory experiments were completed we recontacted the remaining members of the panel (the lazy slugs) and asked them to complete an online version using the same protocol. We fully expected that the online subjects would behave differently due to selection and to differences in the experience of in-lab compared to online participation. To our surprise, we find no significant differences in behavior between the two groups of subjects. This result should provide some reassurance to researchers when making inferences from subjects in one setting or the other.
2 Motivation
It is well-known that online experiments are vulnerable to numerous threats to validity, including subject distraction, absence of experimenter monitoring, expectations about payments, beliefs about their counterparts, selective dropouts, and outside consultation (Dandurand et al. Reference Dandurand, Shultz and Onishi2008; Clifford and Jerit Reference Clifford and Jerit2014; Eckel and Wilson Reference Eckel and Wilson2006; Horton et al. Reference Horton, Rand and Zeckhauser2011). In addition, the online environment may increase perceived social distance, systematically dampening social preferences (Akerlof Reference Akerlof1997). Nevertheless, most studies that explicitly compare subjects recruited to the two venues conclude that online experiments are comparable to lab experiments (Horton et al. Reference Horton, Rand and Zeckhauser2011; Brañas-Garza et al. Reference Brañas-Garza, Capraro and Rascon-Ramirez2018, Reference Brañas-Garza, Jorrat, Espín and Sánchez2023). We stress-test such results by asking whether this equivalence holds when subjects self-select into lab or online experiments.
To expand a bit, when studying social preferences, evidence from prior studies generally supports equivalence, but is somewhat mixed. Buso et al. (Reference Buso, Cagno, Ferrari, Larocca, Loré, Marazzi, Panaccione and Spadoni2021) conduct standard dictator, ultimatum, and public goods games and find no systematic differences in pro-social behavior across three different settings: in-lab, online with video monitoring, and online without video monitoring. However, while Hergueux and Jacquemet (Reference Hergueux and Jacquemet2015) find "strong parallelism" between online and lab behavior, they note that online participants display slightly more pro-social behavior than their lab counterparts. Prissè and Jorrat (Reference Prissè and Jorrat2022) find that most behaviors are consistent between the lab and online venues. However, in a dictator game with a charity recipient, online participants are somewhat more likely to give zero. They ascribe such a difference to social distance and experimenter monitoring. Generally, these studies indicate that there are only minor differences due to venue.
In these experiments, researchers are careful to use individuals taken from the same subject pool, who participate at roughly the same time, and are randomly assigned to the type of venue. Subjects are unable to choose how they would like to participate. We ask, if they are able to select when and where they participate, does that affect the findings?Footnote 1
3 Experimental design and procedure
A random sample of two-thirds of Rice University's 2016 entering freshman class was recruited prior to arriving on campus and participated in an on-line experiment as the initial wave of a longitudinal panel study. A total of 553 of the 661 contacted completed Phase 1 of the study (992 matriculated). Three months after starting classes, those who completed Phase 1 were invited to participate in Phase 2 of the study. A total of 521 subjects participated beginning in early November 2016 (details concerning the sample are in the Supporting Information (SI) Sect. 1).
In Phase 2 all panelists were sent recruitment emails inviting them to sign up for an in-lab study. From November 2, 2016, to November 23, 2016, a total of 22 lab sessions were conducted with the 236 subjects who responded to our request. The remaining panelists who chose not to come to the lab were then recontacted and recruited for an online study. The online study was open from November 28, 2016 to April 28, 2017.Footnote 2 This yielded another 285 subjects. Note that subjects were not randomly assigned to one form of participation or the other. The "eager beavers" chose to come to the lab in response to the general call for subjects. The "lazy slugs" chose not to come to the lab, but later participated online. During the period of active recruitment into the lab, subjects did not know there would be a later opportunity to participate online. Subjects were already familiar with an online study, since they all had participated in Phase 1 prior to matriculating at Rice.
The study interface was identical for both the lab and online participants (see the SI Sect. 2). We used standard lab and online procedures. Subjects attending the lab sessions read instructions and were able to ask questions, and completed their survey questions and tasks independently. Subjects in the online version read instructions and made decisions independently, on their own time, and were not observed.
In this paper, we focus on two third-party dictator games and a standard trust game (details are in the SI—Sect. 2).Footnote 3 In the third-party dictator games, the dictator divides $20 between two other individuals, one of whom is a member of their in-group, and the other of whom is not. Dictators were paid a fixed fee of $5.00 for making the decision. The two third-party dictator games differed in the definition of the groups. In one, the recipients were either from their own residential college (the in-group) relative to someone who was in a different residential college (a real group). In the other, we created artificial minimal groups, and recipients were part of the dictator's minimal in-group and someone who was in the out-group. Subjects completed the two games in random order, and we control for order effects in our discussion (for further details about these treatments, see our paper comparing the group types Eckel et al. (Reference Eckel, Wilson and Youn2022)).
The trust game is widely accepted as a measure of interpersonal trust and trustworthiness (Berg et al. Reference Berg, Dickhaut and McCabe1995). Each actor earned an initial endowment of $10 for completing a 40-item risk-preference survey. In the game, the first-mover (the truster) decides how much of the endowment to send to the second-mover. The second-mover (the trustee) receives an amount equal to three times the amount sent (tripled by the experimenter) and decides how much to send back to the trustee. Subjects used the strategy method to decide how much to return conditional on each possible amount that could be sent.
All subjects completed all of the tasks, and were informed that their randomly-assigned counterparts were also participants in the study. One of the tasks was randomly selected for payment for each subject. For the dictator games subjects were randomized to role (the dictator, the in-group member, or the out-group member). In-group and out-group members were paid based on the allocation of the $20, with the dictator earning the $5 fee. For the trust game, subjects played both roles, truster and trustee in that order, and at the end of the experiment were randomly assigned to one role. Thus both games have a "role uncertainty" design (Iriberri and Rey-Biel Reference Iriberri and Rey-Biel2011). Subjects were not told which task was paid until the end of the experiment. All of the randomization and matching to positions was computerized. For subjects in the lab, matching was within session and subjects were paid their earnings at that point, as is customary for lab experiments. For subjects who were online, matching was with others participating online and payments were delayed until the end of the study. Subjects had previously experienced this form of matching and payment, which was used in the first wave. Subjects spent an average of 39 min in the lab (and less time online) and earned an average of $21.37.
4 Experimental results
The analysis proceeds as follows. Using the third-party dictator games, we compare in-group favoritism in the lab and online for the two games (real and minimal groups). Next, we turn to the trust game data and focus on two measures. The first is the amount sent by the first mover (a measure of trust). The second is the average percentage returned (reciprocity). Under the strategy method subjects specified how much they would return contingent on each whole dollar that could be sent. The percentage returned is calculated for each strategic choice and the average per subject is used as the measure.
First, we find that there are few differences between the subjects opting for the laboratory and those who subsequently took up the online option. Table 1 reports subject characteristics. We find that the eager beavers are very similar to the lazy slugs. There is balance between male and female subjects across venues. Asians are more likely to show up in the lab than Caucasians. When we look at a measure of risk aversion collected in the first-wave study (prior to matriculation), we find no difference between the two sets of participants. The same is true for a measure of time preferences.Footnote 4 There are no differences in GPA measured in several ways. Across the five personality inventory items, we find no significant differences. As proxies for students’ family wealth, we use the zip codes recorded at the time of subjects’ entry to Rice University. Median house value represents the median housing costs, and Median household income indicates the median income by zip code. We find no significant differences in these measures.Footnote 5 The differences we note lose statistical significance when adjustments are made for multiple hypothesis testing Westfall and Young (Reference Westfall and Young1993). All-in-all the two groups are well-balanced across multiple characteristics and measures.Footnote 6
In-lab |
Online |
Difference |
: In-lab = Online |
|||||
---|---|---|---|---|---|---|---|---|
Mean |
SE |
Mean |
SE |
Mean |
SE |
Unadj. p-value |
Adj. p-value |
|
Female |
0.53 |
0.03 |
0.48 |
0.03 |
0.04 |
0.04 |
0.31 |
0.98 |
Ethnicity |
||||||||
Black |
0.06 |
0.01 |
0.05 |
0.01 |
0.00 |
0.02 |
0.90 |
1.00 |
Asian |
0.33 |
0.03 |
0.22 |
0.02 |
0.11 |
0.04 |
0.01 |
0.08 |
White |
0.28 |
0.03 |
0.39 |
0.03 |
0.11 |
0.04 |
0.01 |
0.17 |
Hispanic |
0.14 |
0.02 |
0.16 |
0.02 |
0.02 |
0.03 |
0.59 |
1.00 |
Citizen |
0.83 |
0.02 |
0.87 |
0.02 |
0.04 |
0.03 |
0.20 |
0.96 |
Political Inclination |
2.46 |
0.06 |
2.44 |
0.05 |
0.02 |
0.08 |
0.77 |
1.00 |
Pre-matriculation measures |
||||||||
Risk Aversion |
3.21 |
0.11 |
3.18 |
0.09 |
0.03 |
0.14 |
0.86 |
1.00 |
Time Preference |
2.49 |
0.08 |
2.59 |
0.08 |
0.10 |
0.11 |
0.38 |
0.99 |
GPA in the first semester |
3.61 |
0.03 |
3.55 |
0.03 |
0.06 |
0.04 |
0.18 |
0.94 |
Cumulative GPA by Fall 2021 |
3.62 |
0.02 |
3.59 |
0.02 |
0.03 |
0.04 |
0.37 |
0.99 |
Number of completed hours by Fall |
139.16 |
1.49 |
138.28 |
1.32 |
0.88 |
1.99 |
0.66 |
1.00 |
Short Form Personality Inventory |
||||||||
Extraversion |
4.12 |
0.11 |
3.94 |
0.10 |
0.18 |
0.15 |
0.23 |
0.96 |
Agreeableness |
4.96 |
0.08 |
4.84 |
0.08 |
0.11 |
0.11 |
0.32 |
0.98 |
Conscientiousness |
5.38 |
0.08 |
5.38 |
0.07 |
0.01 |
0.10 |
0.94 |
1.00 |
Emotional Stability |
4.60 |
0.09 |
4.87 |
0.08 |
0.28 |
0.12 |
0.03 |
0.32 |
Openness to New Experiences |
5.34 |
0.07 |
5.26 |
0.06 |
0.08 |
0.09 |
0.36 |
0.99 |
Wealth |
||||||||
Median house value |
315,758 |
15,814 |
343,953 |
15,631 |
− 28,195 |
22,522 |
0.21 |
0.96 |
Median household income |
84,869 |
2,396 |
84,430 |
2,205 |
439 |
3,270 |
0.89 |
1.00 |
Note: Westfall and Young (1993) adjusted p-values are estimated using 10,000 bootstraps. Stata command wyoung (Jones et al. Reference Jones, Molitor and Reif2019) is used. Political Inclination ranges from 1 (liberal) to 5 (conservative). Except for one who started in Spring 2017, our sample started the first (regular) semester in Fall 2016. 5 started in Summer Semester 2016. To construct Median house value and Median household income as proxies for students’ wealth level, we use the zip codes of all students at Rice University when they entered the institution. Median house value represents the median housing costs, and Median household income indicates the median income by zip code
Table 2 presents the social preferences measured in this study and shows the mean differences and p-values of t-tests. The first two rows show giving to an in-group member (relative to a non-in-group-member) from the $20 budget. Both rows show in-group favoritism (amounts greater than $10, more than half of the budget, are sent to the in-group members). Adjusting for multiple hypothesis testing leads to no significant differences being detected.Footnote 7 The last two rows indicate that there are no significant differences in trust or reciprocity.Footnote 8
In-lab |
Online |
Difference |
Unadj.p-val |
Adj. p-val |
|
---|---|---|---|---|---|
Giving to Minimal Ingroup |
12.907 |
12.274 |
0.633 |
0.056 |
0.187 |
(0.251) |
(0.218) |
(0.331) |
|||
n=236 |
n=285 |
||||
Giving to Real Ingroup |
13.212 |
13.018 |
0.194 |
0.559 |
0.676 |
(0.241) |
(0.227) |
(0.332) |
|||
n=236 |
n=285 |
||||
Trust |
4.415 |
4.159 |
0.256 |
0.337 |
0.676 |
(0.190) |
(0.185) |
(0.267) |
|||
n=236 |
n=283 |
||||
Reciprocity |
36.193 |
37.965 |
1.772 |
0.366 |
0.676 |
(1.371) |
(1.376) |
(1.959) |
|||
n=236 |
n=283 |
Note: Standard errors in parentheses. Adjusted p-values are calculated using Stata command wyoung (Jones et al. Reference Jones, Molitor and Reif2019)
Figures 1 and 2 graph the mean for each incentivized measure and includes the 95 percent confidence interval. This figure illustrates what is detailed in Table 2. There is ingroup bias in the dictator games and the effect is true for both laboratory and online subjects. There are no differences for the trust game.
5 Conclusion
We conclude that online and in-lab measures of social preferences are robustly consistent for early and late-takers in an experiment where subjects in the online version are recruited from those who fail to sign up for the lab version. We find minor differences in demographics (Asians are more likely, and Whites less likely, to enroll in a lab setting), and no significant differences in behavior in the two settings. This result should be reassuring for those who are concerned that online measures of social preferences are fundamentally different from their in-lab versions. Both yield similar patterns and outcomes. Once they engage with the experiment, eager beavers and lazy slugs are equally trusting and trustworthy, and favor their ingroup members to the same extent.
Acknowledgements
This study was funded by the National Science Foundation (for Wilson: SES-1534403 and SES-2027556; for Eckel: SES-1534411 and SES- 2027548), and has received IRB approval at Texas A &M and Rice University.
Data availability
Protocols, data, and computer code are available at: https://doi.org/10.17605/OSF.IO/WD6UM. Pre-registered hypotheses are available at: https://doi.org/10.17605/OSF.IO/PZ5WG.
Declarations
Conflict of interest
The authors declare they have no conflict of interest.