A preregistered vignette experiment on determinants of health data sharing behavior: Willingness to donate sensor data, medical records, and biomarkers

Henning Silber; Frederic Gerdon; Ruben Bach; Christoph Kern; Florian Keusch; Frauke Kreuter

doi:10.1017/pls.2022.15

A preregistered vignette experiment on determinants of health data sharing behavior

Willingness to donate sensor data, medical records, and biomarkers

Published online by Cambridge University Press: 15 September 2022

Ruben Bach ,

Florian Keusch and

Henning Silber*: Affiliation:
GESIS - Leibniz Institute for the Social Sciences
Frederic Gerdon: Affiliation:
Mannheim Centre for European Social Research (MZES), University of Mannheim, Ludwig-Maximilians-Universität München
Ruben Bach: Affiliation:
University of Mannheim
Christoph Kern: Affiliation:
University of Mannheim
Florian Keusch: Affiliation:
University of Mannheim
Frauke Kreuter: Affiliation:
Ludwig-Maximilians-Universität München, University of Maryland
*: Correspondence: Henning Silber, GESIS - Leibniz Institute for the Social Sciences, B6 4–5, 68159 Mannheim, Germany. Email: [email protected]

Article contents

Abstract
Introduction
Theoretical background
Previous research
Willingness to share health data: Data type, recipient, and purpose
Preregistered research design
Data
Results
Discussion
Conclusion
Supplementary Materials
Data availability statement
Footnotes
References

Abstract

The COVID-19 pandemic has spotlighted the importance of high-quality data for empirical health research and evidence-based political decision-making. To leverage the full potential of these data, a better understanding of the determinants and conditions under which people are willing to share their health data is critical. Building on the privacy theory of contextual integrity, the privacy calculus, and previous findings regarding different data types and recipients, we argue that established social norms shape the acceptance of novel practices of data collection and use. To investigate the willingness to share health data, we conducted a preregistered vignette experiment. The scenarios experimentally varied the vignette dimensions by data type, recipient, and research purpose. While some findings contradict our hypotheses, the results indicate that all three dimensions affected respondents’ data sharing decisions. Additional analyses suggest that institutional and social trust, privacy concerns, technical affinity, altruism, age, and device ownership influence the willingness to share health data.

Keywords

data sharing health research privacy attitudes contextual integrity cancer research public policy

Type: Life Science in Politics: Methodological Innovations and Political Issues
Information: Politics and the Life Sciences , Volume 41 , Issue 2 , Fall 2022 , pp. 161 - 181

DOI: https://doi.org/10.1017/pls.2022.15 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Open Practices: Open materials Open data Preregistered
Copyright: © The Author(s), 2022. Published by Cambridge University Press on behalf of the Association for Politics and the Life Sciences

Introduction

The global COVID-19 pandemic has spotlighted the relevance of health research and evidence-based public policy decision-making around the world. Technological advancements have made it possible to collect, share, and analyze large amounts of health data. However, appropriate data collection infrastructures and instruments are needed to collect high-quality data, which have been shown to be lacking in several countries during the COVID-19 pandemic (e.g., Klingwort & Schnell, Reference Klingwort and Schnell2020; Schaurer & Weiß, Reference Schaurer and Weiß2020). Moreover, the quality of empirical evidence relies heavily on people’s willingness to share their health data for research purposes (Aitken et al., Reference Aitken, Jorre, Pagliari, Jepson and Cunningham-Burley2016). Willingness to share data is closely connected to questions of data privacy and ethics that need to be asked anew with the rise of novel data sources, such as smartphone sensors that track bodily functions and mobility (Oberski & Kreuter, Reference Oberski and Kreuter2020; Struminskaya et al., Reference Struminskaya, Lugtig, Toepoel, Schouten, Giesen and Dolmans2021). In this context, data collectors need to take a fine-grained perspective on such sentiments as acceptance of data use may strongly depend on the concrete scenario in which a person is asked to share personal information. This is because the legitimacy of a specific data collection may be questioned by individuals if strong and transparent privacy safeguards are not in place along each step of the data sharing process.

To comply with the public’s privacy expectations, policymakers and data collectors need to know the conditions under which the collection of specific kinds of data is considered acceptable by their citizens. Understanding privacy as “contextual integrity” (CI; see Nissenbaum, Reference Nissenbaum2010, Reference Nissenbaum2019) provides a context- and situation-sensitive perspective on data flows that allows us to investigate the circumstances under which people accept the collection and use of their health data. CI is upheld if no violation of context-specific privacy norms occurs. CI posits that the (novel) data flow needs to be specified and then evaluated to determine whether it conforms with established and context-specific privacy norms.

The novelty of data flows that aim to improve public health depends on which practices are already established in contexts within specific countries. For example, Germany is a country in which the digitalization of the health system is not advanced compared with many other EU countries (Bertelsmann Stiftung, 2019). Several technological and medical developments (e.g., electronic patient records) could be more integrated into the maintenance of individual and public health. Sensor data from smartphones promise greater digitalization of medical health research. However, in order to roll out new systems, such as applications that monitor COVID-19, in a manner that is ethical and acceptable to the public, it is crucial to construct data flows that align with contextual norms. Yet, most of these technologies require data flows that citizens are not familiar with, and social norms for these data have only been established to a limited extent. Still, these novel data flows may be embedded in established social contexts or resemble already existing data flows (see, e.g., Vitak and Zimmer, Reference Vitak and Zimmer2020, with respect to the acceptance of COVID-19 contact tracing apps depending on situational parameters). Therefore, to improve individual and public health, we need to learn which health data flows are considered appropriate in which contexts.

Against this backdrop, we investigated the conditions under which individuals deem the sharing of different types of health data to be more acceptable, particularly with respect to the sharing of health data for public or personal benefit. Our study drew on the framework of CI to define 18 unique data sharing scenarios, which were presented to respondents in an online vignette survey experiment (Auspurg & Hinz, Reference Auspurg and Hinz2015). These scenarios varied on three levels: data type, data recipient, and data use purpose. We presented randomly selected vignettes related to cancer research, which has the advantage that our results were not directly affected by current events or changes in the global health situation regarding the COVID-19 pandemic. At the same time, cancer receives a large amount of attention from the scientific community and the public and affects many people’s lives. Thus, combating this disease should be relevant to most citizens. Therefore, willingness to share data for cancer research may be higher than for comparatively less severe and/or less known diseases.

Studying willingness to share health data across different scenarios allows us to better understand which data flows are socially considered appropriate for sharing health data for private and public benefit. In particular, given the interplay of public and private entities in handling such new types of health data flows, the findings tell us whether private- and public-benefit uses of health data are accepted only when requested by private and public data recipients, respectively. This empirical investigation provides insights by shedding light on the nature of social norms in the health contexts—that is, which recipients and which data are appropriate to be used in the provision of personal and public health. For data sharing practice, the findings can inform the design of data collection activities of public and private organizations and help adjust practices to the expectations of individuals, thereby increasing the trust and willingness of citizens to participate.

Theoretical background

CI provides a framework to jointly investigate several relevant features of data flows, thereby allowing researchers to empirically ascertain which factor combinations are publicly accepted and align with social norms. From a CI perspective, the following situational parameters need to be specified to sufficiently describe data flows: the data type; the involved actors, such as the data sender and recipient; and the transmission principles, that is, the “rules” under which the data are transferred (Nissenbaum, Reference Nissenbaum2010, Reference Nissenbaum2019). For example, individuals (data senders) might find it acceptable to provide sensor data from their smartphones (Data Type A) to a company (Recipient A) or to give consent to transfer a copy of their medical records (Data Type B) to university researchers (Recipient B) but not to a public authority (Recipient C). The CI perspective, however, does not allow us to make predictions about whether specific parameters, such as specific data types or recipients, will be generally more accepted. Instead, it can be argued that the closer a specific data flow is to contextual privacy norms, the higher is the likelihood that people will accept this data flow.

The CI theory suggests a prescriptive understanding of social norms, that is, what is “right” to do in a certain situation (Nissenbaum, Reference Nissenbaum2010). Yet, from a CI perspective, novel data flows may still be acceptable if they fulfill contextual purposes better than established practices, even if they do not conform to them (Nissenbaum, Reference Nissenbaum2010). In such situations, individuals might still be willing to share data, for example, because the data flow serves a public purpose that is perceived as sufficiently important and appropriate. Similarly, individuals may think that a data flow conforms with established norms but may nonetheless be hesitant to provide their data—for example, because the purpose is not perceived as sufficiently important or the effort to share these data is considered too high.

From the perspective of individual decisions to share data when confronted with novel practices, we argue that individuals may consider potential benefits and risks, as suggested by the notion of the “privacy calculus” (Culnan & Armstrong, Reference Culnan and Armstrong1999). More specifically, the privacy calculus assumes that privacy is an economic good that can be traded for benefits, such as other goods or services (Kehr et al., Reference Kehr, Kowatsch, Wentzel and Fleisch2015; Smith et al., Reference Smith, Dinev and Xu2011). For example, individuals may decide whether to use new technologies depending on their ease of use and their usefulness (Davis et al., Reference Davis, Bagozzi and Warshaw1989). Considering the privacy calculus, we suggest that the privacy-specific risks and benefits are related to the fulfillment of contextual norms and goals. This means that individuals evaluate a novel health data flow depending on its appropriateness to fulfill the contextual purpose of promoting health. In short, we argue that novel data flows that do not conform to established norms may still be acceptable to individuals and that their acceptability is linked to the perceived benefits and costs of the new data flow, which are context dependent.

With respect to the purpose of a data flow, we need to determine which purposes individuals consider to be relevant contextual purposes. According to CI, purposes are core constitutive elements of social contexts (Nissenbaum, Reference Nissenbaum2019). Certain sub-contexts (see Nissenbaum, Reference Nissenbaum2010) of the health context might be understood to serve one purpose more than another. For example, the doctor-patient relationship is likely to constitute a sub-context that has the purpose of improving personal health. In contrast, transferring information about COVID-19 symptoms to a local public health agency likely serves the purpose of safeguarding public health. Yet, in both cases, personal and public benefits may arise. With respect to the acceptability of data sharing, however, it is a crucial to determine which uses are perceived to serve the desired improvement of public or personal health and which uses are perceived to violate central tenets of the health context.

In line with CI theory, our study has a strong situational and exploratory component as we cannot stipulate that any data type, recipient, or purpose that is aimed at providing individual and collective benefits is, as such, more or less acceptable to individuals. Instead, we need to consider the situational parameters in interaction with another. Given the theoretical considerations outlined earlier, our hypotheses are led by three prepositions: Health data flows that are closer to established privacy norms are more likely to be accepted by individuals (P1). Individuals are more likely to share their health data when the benefits (personal and collective) of doing so appear higher and the costs (e.g., required effort and consequences of out-of-context use) appear lower (P2). The potential benefits and costs of a novel data flow need to be interpreted with respect to the social context in which the data flow is embedded (P3). In the following, we specify the CI framework parameters to investigate the conditions under which individuals are willing to share their health data.

Previous research

Prior empirical research has investigated the willingness to share health data in several scenarios, showing, for example, that data sharing is viewed as most acceptable when the purpose is in the interest of the public, when the data are shared in a privacy- and security-preserving way,Footnote ¹ and when the data recipient can be trusted (Waind, Reference Waind2020). Previous work on the use of health administrative and clinical trial data also found that trust and public benefits are key to data sharing acceptability (Hutchings et al., Reference Hutchings, Loomes, Butow and Boyle2020). In addition, control over the data that are shared was shown to be an important mediating factor that influenced willingness to share health data (e.g., Jones et al., Reference Jones, Daniels, Squires and Ford2019; Juga et al., Reference Juga, Juntunen and Koivumäki2021; Stockdale et al., Reference Stockdale, Cassell and Ford2018). It was also emphasized that citizens are concerned about the profit orientation of commercial data recipients and that they favored a public benefit for those data recipients (Aitken et al., Reference Aitken, Jorre, Pagliari, Jepson and Cunningham-Burley2016).

Earlier research also found that people are indeed willing to share (health) data, such as biobank data, for health research purposes (Husedzinovic et al., Reference Husedzinovic, Ose, Schickhardt, Fröhling and Winkler2015). In contrast, more skepticism can be expected for health-related use of data collected in nonhealth contexts. For example, previous research showed that the use of data collected on Facebook for research purposes is often less accepted than uses that are merely aimed at improving user experience (Gilbert et al., Reference Gilbert, Vitak and Shilton2021). Similarly, a survey showed that linking health data to personal nonhealth data was less acceptable than linking data from the same context (Aitken et al., Reference Aitken, McAteer, Davidson, Frostick and Cunningham-Burley2018).

Previous survey experiments based on CI have shown that respondents’ privacy attitudes changed depending on who exactly received which kinds of data under which conditions. For example, Martin and Nissenbaum (Reference Martin and Nissenbaum2017) showed that commercial uses (e.g., health data sold to pharmaceutical companies for marketing) overall conform less with privacy expectations than uses that fulfill contextual purposes (e.g., health data used for research to improve health conditions). In another study, Martin and Shilton (Reference Martin and Shilton2016) showed that privacy expectations with respect to data collection from mobile devices for targeted ads and tracking greatly vary depending on the situational parameters. In addition to such situational parameters and contextual norms, individual characteristics may impact citizens’ evaluations of data flows. For instance, individuals with high trust in government institutions may be less skeptical of data used by public authorities than individuals with lower institutional trust (Kehr et al., Reference Kehr, Kowatsch, Wentzel and Fleisch2015). While individuals may, regardless of their level of trust in the government, support the use of health data for research that aims to improve public health generally (Waind, Reference Waind2020), they would likely disagree on who should receive such data to achieve this purpose. Other individuals may reject the idea of sharing their personal health data with any recipient because they regard the requested data as too personal and the data sharing request as intrusive (Lacasse et al., Reference Lacasse, Gagnon, Nguena Nguefack, Gosselin, Pagé, Blais and Guénette2021).

Gerdon et al. (Reference Gerdon, Nissenbaum, Bach, Kreuter and Zins2021) conducted a vignette experiment on the acceptability of data sharing in which they compared the acceptance of data sharing of health data with two other data types (energy consumption and location data). They also experimentally varied the organization that received the data (a public authority or a company). Surprisingly, sharing data with a public institution was overall less accepted than sharing data with a private organization. This finding has worrisome implications, especially considering the COVID-19 pandemic but also in general for other public health crises, as public institutions rely on data to monitor and prevent the spread of diseases, for example, through contact tracing apps or the targeted implementation of public health campaigns. However, the study only investigated one specific type of health data, while health research and public health policy rely on several sources of data to tackle issues of public health.

Willingness to share health data: Data type, recipient, and purpose

In this section, we discuss the effects of changes in CI-based data flow parameters on the willingness to share health data. In particular, we are interested in several recent technological and medical opportunities that have the potential to be used more frequently in Germany and in many other countries in the near future: electronic health records, biomarker data,Footnote ² and health-related smartphone sensor data. These data types cover different types of health data collections that may happen in different social contexts with various data recipients. They especially may involve privacy considerations specific to the data type and/or private actors (Gerdon et al., Reference Gerdon, Nissenbaum, Bach, Kreuter and Zins2021). On the one hand, medical records and biomarker data are usually collected in narrow and well-defined contexts that suggest high standards of data protection—that is, by physicians or other care providers, health insurances, and researchers. Sensors, one the other hand, can amass high volumes of data in infrastructures in which sharing is technically feasible among multitudes of actors, such as app developers, smartphone providers, and other third-party actors. Individuals may associate various contexts and potential uses when considering sharing their sensor data. The use of sensor data out-of-context appears to be a more salient threat than, for example, for the use of biomarkers, which has been discussed with respect to COVID-19 tracing apps (Vitak & Zimmer, Reference Vitak and Zimmer2020). Therefore, we expect that people will be more likely to agree to share their biomarker data and medical records than their sensor data if the recipient has a public background (H1.1). For private recipients, we expect that individuals will be less likely to share data that are associated with specific health contexts (medical records and biomarkers) than sensor data (H1.2). Overall, we argue that the high effort required to share biomarkers (e.g., blood) results in a particularly strong data sharing hesitancy for this data type. Therefore, the acceptance to share biomarker data should be, ceteris paribus, the lowest among the three data types studied (H1.3).

With respect to data recipients, a particular concern is the previously found lower acceptance of data sharing with public institutions compared with private entities in Germany (Gerdon et al., Reference Gerdon, Nissenbaum, Bach, Kreuter and Zins2021). Such reluctance might result from concerns that government institutions could use the data for different purposes than initially intended without asking for permission (Turow & Hennessy, Reference Turow and Hennessy2007; Weitzman et al., Reference Weitzman, Kelemen, Kaci and Mandl2012). While such concerns can be present for private recipients (e.g., companies) as well, concerns about potential consequences might be more pronounced for public institutions, especially with respect to government surveillance. However, research shows that there are differences in trust levels across public institutions (Krause et al., Reference Krause, Brossard, Scheufele, Xenos and Franke2019), and citizens may approve of public-benefit uses of data with respect to certain public institutions that explicitly follow research purposes, for example, dedicated university research centers (Karampela et al., Reference Karampela, Ouhbi and Isomursu2019; Mello et al., Reference Mello, Lieou and Goodman2018).

Given the different possible public and private recipients, we argue that out-of-context use is least likely to be expected from university research centers. At the same time, the recipients are unlikely to be associated with differences in perceived benefits or required data sharing efforts. Therefore, we expect that the willingness to share data will be higher for university research institutions than for public health authorities and private companies (H2.1). Moreover, trust is a central prerequisite for accepting the sharing of health data (Bauer et al., Reference Bauer, Keusch and Kreuter2019). Individuals may vary in their trust toward different recipients, irrespective of the indicated purposes for which the data will be used. Therefore, higher trust in the respective organization should, ceteris paribus, lead to a higher willingness to share data (H2.2).

Taking the contextual perspective into account, a data recipient can never be fully separated from the purpose for which the recipient plans to use the data. While each of the data types can be analyzed to provide a benefit to the individual data subject (e.g., improvement of diagnoses, recommendations on health-related behavior) and/or recipient, the public also appears to be willing to accept the use of health data for the public interest (Bearth & Siegrist, Reference Bearth and Siegrist2020; Waind, Reference Waind2020)—that is, to improve public health. In both cases, individuals may perceive the data sharing to be useful. Yet, while we assume that individuals will be generally more likely to share their data if they anticipate a personal benefit (H3.1), it may depend mainly on the data sharing context, especially on the data recipient, to determine in which situation(s) these benefits are considered as sufficient, for example, because of the low risk of out-of-context use.

Some sub-contexts of the health context might be more oriented toward promoting individual health (e.g., doctor-patient relationships), while others are more linked to the improvement of public health (e.g., health agency-individual relationships regarding notifiable infectious diseases). It is likely that public recipients are associated with public-specific contextual goals, while private recipients are associated with private-specific contextual goals. However, Gerdon et al. (Reference Gerdon, Nissenbaum, Bach, Kreuter and Zins2021) did not consistently find such a relationship. Yet, individuals are expected to have a higher likelihood of fearing out-of-context use if recipients use data for a purpose that is not in accordance with established norms. Therefore, we expect that a match between a private data recipient and a private purpose and a public data recipient and a public purpose will result in higher acceptance rates than a “mismatch” between data recipient and purpose (H3.2).

Beyond contextual characteristics, individuals may vary in how much they are willing to help others and contribute to the public welfare. That is, some individuals may be more inclined than others to perceive public health benefits as an appropriate purpose compared with individual health benefits. Thus, we hypothesize that individuals who display higher altruism (Kim & Stanton, Reference Kim and Stanton2016) will be more willing to share health data for public benefit than people with lower scores on altruism (H4.1). Similarly, we assume that the more individuals perceive public duties (Voigt et al., Reference Voigt, Holtz, Niemiec, Howard, Middleton and Prainsack2020), such as voting and paying taxes, as important obligations of good citizens, the more willing they will be to share health data for a public benefit (H4.2). In addition, given the general trend of increasing trust in scientists in recent years (Funk et al., Reference Funk, Hefferon, Kennedy and Johnson2019),Footnote ³ we expect that higher levels of general trust in the scientific community will positively affect the likelihood to share data for a public benefit (H4.3). Sharing for a personal benefit should be less or not affected by trust in science.

Finally, without a concrete hypothesis, we collected data about respondents’ cancer exposure, smartphone and smartwatch use, technical affinity, social trust, and political ideology. These supplementary analyses, which are exploratory in nature, are reported at the end of the Results section.

Given the importance of data sharing for health research and policymaking, the results of our study can help inform the scientific debate about data sharing hesitancy. The study can help develop best practice advice for three data types (sensor data, medical history, and biomarkers) but also identify privacy-related social norms. Since, in practice, there is rarely a previously tested scenario that exactly matches the needs of a data recipient, the study can contribute to a better general understanding of how situational parameters may work differently for different data types. Additionally, the breakdown of data types, recipients, and purposes allows us to estimate the relative importance of each component. This will help identify the main drivers of respondents’ willingness to share data. For example, for some groups of respondents, their level of trust in the data recipient might be especially important, whereas for other respondent groups, the purpose might be the most relevant variable. Getting a deeper understanding of the mechanisms behind nonacceptance can also help us develop successful and privacy-conforming data sharing practices that increase willingness to share data for research.

Preregistered research design

We conducted a preregistered survey experiment in which we randomly varied parameters of the data flow as defined by the CI framework to learn which kinds of health data German citizens were willing to share under which conditions.Footnote ⁴ The so-called vignette experiment or factorial survey experiment (Auspurg & Hinz, Reference Auspurg and Hinz2015) was implemented in a web survey in Germany with a minimum sample size of about 750 respondents. This sample size was based on an approximated power analysis using an ANOVA design with repeated measures and within-between interaction, using the software G*Power (Faul et al., Reference Faul, Erdfelder, Lang and Buchner2007) (input parameters: effect size = 0.1,Footnote ⁵ α error probability = 0.05, power = 0.95, number of groups = 18, number of measurements = 3, nonsphericity correction = 1). The suggested sample size was 648 respondents. To account for possible exclusion of cases because of insufficient data quality, we increased the minimum sample size by 15 percent, which resulted in 746 respondents. The respondents were recruited from a German commercial online nonprobability access panel and received a small monetary incentive for their participation. To ensure a heterogeneous sample, we screened by gender, age, and educational attainment to represent noncrossed quotas of the German general population.

As displayed in Table 1, the vignette experiment included three dimensions: data type (sensor data, medical records, biomarkers), data recipient (public health agency, university research center, private company), and purpose of the research (public policy, personal recommendation). This resulted in 18 unique vignettes (3 × 3 × 2). We presented each respondent with one vignette on each data type in random order. Thus, each respondent was randomly assigned to one of the six versions (three data recipients combined with two purposes) for each data type. Random assignment and order allowed us to control for potential context effects.

Table 1. Dimensions and levels of the vignettes.

Structure of vignettes: [DATA TYPE]. With the consent of a person, these data are transmitted to a German [RECIPIENT]. This [RECIPIENT] uses these data [PURPOSE]. The [RECIPIENT] guarantees that the data are safe, anonymous, and protected from misuse.

To specify all CI parameters, we needed to define the data subject, data sender, and transmission principle. We kept the transmission principle constant by defining a high level of individual control over the data use—that is, we measured individual willingness to share under conditions that enable individuals to make an active decision to agree to data use or not (i.e., opt in). The data subjects were always the respondents themselves. Finally, the data sender was always fixed within each data type and adjusted to produce a realistic scenario.

The following sections provide descriptions of the vignettes by data type.

Data Type 1: Sensors

Sensors installed on smartphones, smartwatches, and other wearable devices collect data that can be used to assess the health condition of people. With the consent of a person, these data are transmitted to a German public health agency [private company; university research center]. This public health agency [private company; university research center] uses these data for a research program to fight cancer. [This public health agency [private company, university research center] uses these data to provide the persons with personal recommendations on their health behavior with respect to protection against cancer.Footnote ⁶] The public health agency [private company; university research center] guarantees that the data are safe, anonymous, and protected from misuse.

Data Type 2: Medical records

Health records obtained from doctors’ offices can be used to assess the health conditions of people. With the consent of a person, these data are transmitted to a German public health agency [private company; university research center]. This public health agency [private company; university research center] uses these data for a research program to fight cancer. [This public health agency [private company; university research center] uses these data to provide the persons with personal recommendations on their health behavior with respect to protection against cancer.] The public health agency [private company; university research center] guarantees that the data are safe, anonymous, and protected from misuse.

Data Type 3: Biomarkers

Blood samples that are collected for biobanks can be used to assess the health conditions of people. With the consent of a person, these data are transferred to a German public health agency [private company; university research center]. This public health agency [private company; university research center] uses these data for a research program to fight cancer. [This public health agency [private company, university research center] uses these data to provide the persons with personal recommendations on their health behavior with respect to protection against cancer.] The public health agency [private company; university research center] guarantees that the data are safe, anonymous, and protected from misuse.

We then asked respondents, “How likely or unlikely would you agree to share your health data for this purpose?” The response categories were as follows: (1) very unlikely, (2), (3), (4) neither likely nor unlikely, (5), (6), (7) very likely.

Other measures

The study included several additional measures,Footnote ⁷ which were needed to test some of our hypotheses (trust in science in general, trust in public health agencies, private companies, and university research centers, altruism, attitudes toward public duties) and to conduct the additional exploratory analyses (cancer exposure, smartphone and smartwatch usage, technical affinity, social trust, political ideology, and sociodemographic characteristics). Specifically, respondents’ cancer exposure was measured by asking whether the respondent, a relative, or a close friend had ever been diagnosed with cancer. Device ownership was measured by a single multiple-choice question. Technical affinity was measured using five rating scale items about, for example, how good a respondent is at operating digital systems (Schauffel et al., Reference Schauffel, Schmidt, Peiffer and Ellwart2021). Public duty was measured using three items featuring a rating scale that asked about what respondents think a good citizen should do (e.g., to obey laws; ESS Round 1: European Social Survey, 2018). A respondent’s level of institutional trust with respect to the three data recipients of our vignette design, and with respect to science in general, was assessed using individual items with a rating scale for each institution (based on ESS Round 9: European Social Survey, 2021). Similarly, social trust was asked using a single item with a rating scale asking whether most people can be trusted or not (ESS Round 9: European Social Survey, 2021). Respondents’ altruism was measured by asking about their willingness to do something good without expecting anything in return (SOEP-IS Group, 2021). Finally, political ideology was measured using respondents’ self-reported left-right orientation (ESS Round 9: European Social Survey, 2021).

The question wordings for all these measures are provided in the appendix. For measures that include multiple items, we conducted an explorative factor analysis to verify that the items load on a single factor. Items with lower factor loadings than 0.5 were excluded.Footnote ⁸ Basic sum scores were used to combine the items to a single measure for the respective construct.

The placement of the additional measures within the questionnaire is not a trivial decision. If they are placed before the vignette experiment, they could affect the answers to the vignettes. If they are placed after the vignette experiment, the vignette questions could affect the answers to the additional measures that are intended to explain the answers to the vignettes. Since none of these placements is optimal, a random half of the sample received the additional measures before the vignette experiment and a random half after the experiment. This randomization in the placement of the vignette experiment and the other measures allowed us to control for possible order effects within our analyses. Similarly, we randomized the order of the items within each multiple-item measure to avoid systematic question order effects.

Data

The data were collected using a sample drawn from a German online access panel administered by Bilendi and respondi, which had been used for scientific research before (e.g., Beuthner et al., Reference Beuthner, Keusch, Silber, Weiß and Schröder2022; Daikeler et al., Reference Daikeler, Bach, Silber and Eckman2022; Gerdon et al., Reference Gerdon, Nissenbaum, Bach, Kreuter and Zins2021; Silber et al., Reference Silber, Schröder, Struminskaya, Stocké and Bosnjak2019). The field time was between May 30 and June 2, 2022. The panel provider invited 14,000 panel members by email to our survey. In all, 2,423 individuals started the survey by clicking on the link in the invitation email. Of these, 34 panel members were screened out, and 1,088 could not participate because our quotas had been reached. Another 140 respondents did not complete the questionnaire. This resulted in 1,161 completed interviews before conducting quality checks.Footnote ⁹ The median response time was 5 minutes and 6 seconds, and the average enjoyment rating of the survey was 4.10 on a scale from 1 “not at all” to 5 “very good.”

To recruit a diverse set of participants, we used quotas based on the German “Mikrozensus” 2019 regarding age, gender, and educational attainment. Descriptive results of the demographics and the other measures of the initial sample (before the data quality checks) can be found in the online supplement (see Table A1 in the Supplementary Materials). The study was approved by the ethical review board of the University of Mannheim (EK Mannheim 22/2022).

Data quality checks

We implemented three data quality checks.Footnote ¹⁰ First, we excluded respondents who provided item nonresponse to one of the vignettes or the covariates. As a robustness check, we initially planned to impute missing values and report analyses of our hypotheses with imputation in the online supplement. Second, using paradata on response time, we excluded speeders, that is, respondents who answered the questions so fast that they could not possibly have read and processed the questions. For this, we used the method proposed by Roßmann (Reference Roßmann2010), which identifies all respondents who finish the survey in less than 60 percent of the median completion time as speeders. The analyses without speeders are included in the main text, whereas the analyses with speeders are provided in the Supplementary Materials.Footnote ¹¹ Third, we tested whether the experimental assignment worked with respect to demographic characteristics (i.e., gender, age, and education). For this analysis, we used χ²-tests. In case there was a systematic dependency of the experimental assignment, we used those demographic variables as control variables throughout our analyses.

Data analyses

The data analyses included multilevel models to account for the vignette experiment’s hierarchical data structure (vignettes nested in respondents). First, we analyzed our hypotheses regarding the data type (H1.1 to H1.3), data recipient (H2.1 to H2.2), and purpose of the research (H3.1 and H3.2). H1.3, H2.1, and H3.1 are concerned with the main effects of the data type, data recipient, and research purpose on the willingness to share data, while H1.1, H1.2, and H3.2 were tested by considering an interaction effect between the vignette characteristics and data type, data recipient, and research purpose. To test H2.2, H4.1, H4.2, and H4.3, interactions between vignette characteristics and respondent characteristics were specified, namely, between data recipient and trust in the respective institution (H2.2), research purpose and altruism (H4.1), research purpose and attitudes toward public duties (H4.2), and research purpose and trust in science (H4.3). While the main analyses focused on random-effects models in which the dependent variable was treated as continuous, we implemented two additional model sets as robustness checks. These included fixed-effects models with continuous outcomes and random-effects models in which the dependent variable was treated as ordinal.

Lastly, we conducted exploratory analyses regarding the additional measures. The analyses were conducted using the statistical software R (R Core Team, 2020) and the packages broom.mixed (Bolker & Robinson, Reference Bolker and Robinson2021), flextable (Gohel, Reference Gohel2022a), GGally (Schloerke et al., Reference Schloerke, Cook, Larmarange, Briatte, Marbach, Thoen, Elberg and Crowley2021), hmisc (Harrell, Reference Harrell2021), knitr (Xie, Reference Xie2021), lme4 (Bates et al., Reference Bates, Maechler, Bolker and Walker2015), lmerTest (Kuznetsova et al., Reference Kuznetsova, Brockhoff and Christensen2017), missForest (Stekhoven, Reference Stekhoven2022), mitml (Grund et al., Reference Grund, Robitzsch and Luedtke2021), multilevelTools (Wiley, Reference Wiley2020), officer (Gohel, Reference Gohel2022b), ordinal (Christensen, Reference Christensen2019), plm (Croissant & Millo, Reference Croissant and Millo2018), psych (Revelle, Reference Revelle2021), stargazer (Hlavac, Reference Hlavac2018), summarytools (Comtois, Reference Comtois2022), texreg (Leifeld, Reference Leifeld2013), and tidyverse (Wickham et al., Reference Wickham, Averick, Bryan, Chang, D’Agostino McGowan, François, Grolemund, Hayes, Henry, Hester, Kuhn, Lin Pedersen, Miller, Milton Bache, Müller, Ooms, Robinson, Paige Seidel, Spinu, Takahashi, Vaughan, Wilke, Woo and Yutani2019) for the multilevel models. All statistical tests were two-sided. Anonymous data and statistical analysis code are available through a public repository.Footnote ¹² Preprepared analysis code that makes use of synthetic data for implementing the modeling steps outlined earlier was provided on the OSF page of this study as part of the preregistration process.Footnote ¹³

Transparent changes

We deviated from the preregistration in three instances. First, we planned to test the experimental assignment regarding the region in which respondents live but failed to collect this variable in our study, so we had to deviate from the preregistered analyses in this respect. Second, the step of excluding respondents who did not complete the full questionnaire was moved from the data quality check section to the data section without changing the procedure of excluding incomplete interviews. Third, since only a small number of respondents contributed item nonresponse, we decided to deviate from the preregistration and did not replicate the analyses with imputed values for those respondents.

Results

Data quality and robustness checks

First, we excluded up to eight respondents who provided item nonresponses, depending on the variables included in the specific analysis (see Tables A1–A15 in the Supplementary Materials). Since such a small number of respondents provided item nonresponse, we decided against replicating the analyses with imputed values for those respondents. Second, we excluded 146 speeders, which were defined as respondents who finished the survey in less than 60 percent of the median completion time. Analyses with speeders can be found in the online supplement (see Table A15 in the Supplementary Materials). This robustness check showed that the decision of excluding speeders did not affect the substantive results reported here. Third, a series of χ²-tests confirmed that the experimental assignment of the vignettes worked except for education and data recipient. Thus, we included education as an additional covariate in all models (see Tables A2–A10 in the Supplementary Materials).Footnote ¹⁴

In addition, we included a question asking respondents whether they had read the vignettes carefully, with seven response categories ranging from 1, “not at all carefully,” to 7, “very carefully,” which had a mean rating of 6.10. Since only eight respondents selected the values 1 or 2, we decided against a robustness check excluding those respondents.

As robustness checks, we replicated the multilevel models (1) as fixed-effects models with continuous outcomes and (2) as random-effects models in which the dependent variable is treated as ordinal (see Tables A16–A21 in the Supplementary Materials). Neither alternative approach changed the substantive findings compared with the random-effects models with continuous outcomes.

Preregistered hypotheses

Table 2 shows the descriptive results for each of the 18 vignettes. The level of willingness to provide data for health research ranged from 3.37 for sharing sensor data with a public health agency for a personal benefit to 4.84 for sharing biomarkers with a university research center for a public benefit. Given that the scale ranged from 1, “very unlikely,” to 7, “very likely,” the sharing levels are around the midpoint of the answer scale, with four vignettes showing values above 4.5 and two vignettes showing values below 3.5.

Table 2. Mean levels of willingness to share health data and 95% confidence intervals for 18 vignettes.

Regarding our hypotheses about the main effect of the vignette experiment, Model 1a in Figure 1 shows that H1.3, which suggested that biomarkers would return the lowest willingness to share, was not supported. On the contrary, respondents reported that they would be significantly more likely to share biomarkers ( $ \hat{\beta} $ = .616, p < .001) and medical records ( $ \hat{\beta} $ = .435, p < .001) compared with sensor data. The main effect hypothesis regarding the recipient (H2.1) suggested that the willingness to share would be highest for university research centers. The data supported this hypothesis, with respondents showing significantly lower willingness to share health data with both other recipients: private companies ( $ \hat{\beta} $ = –.660, p < .001) and public health agencies ( $ \hat{\beta} $ = –.380, p < .001). With respect to the purpose, we expected that respondents would be more willing to share their health data if they anticipated a personal benefit (H3.1). However, the experimental results show that the willingness to share was significantly higher for the vignettes that featured a public benefit as compared to a personal benefit ( $ \hat{\beta} $ = –.256, p < .001). When considering interaction effects, none of our hypotheses about the interaction between data type and recipient (H1.1 and H1.2, Model 1b) and the interaction between recipient and purpose (H3.2, Model 1c) was supported (p ≥ .05).

Figure 1. Results of the multilevel regression analyses predicting willingness to share health data: Main effects and interaction effects between the experimental dimensions. Model 1a displays results of the main effects of the vignette dimensions. Model 1b displays the main effects and the interaction between data recipient and purpose. Model 1c displays the main effects and the interaction between data type and recipient. The dots show the respective point estimates, and the bars indicate the 95% confidence intervals.

Figure 2 shows the interaction effects with additional measures. H2.2 suggested that higher levels of trust in the respective recipient will result in a higher willingness to share health data. The experimental results support this hypothesis for the two recipients, private company ( $ \hat{\beta} $ = .117, p < .001, Model 2a) and university ( $ \hat{\beta} $ = .103, p < .001, Model 2c), but not for public agency ( $ \hat{\beta} $ = .025, p = .268, Model 2b). Hypotheses H4.1, H4.2, and H4.3 suggested interaction effects of public purpose with trust in science in general, perceptions of the importance of public duties, and altruism. The interaction effects for trust in science ( $ \hat{\beta} $ = .060, p = .013, Model 3a) and altruism ( $ \hat{\beta} $ = .054, p = .011, Model 3b) were in the expected direction and significant, showing higher willingness to share when they displayed higher values on these covariates, while public duty showed an effect in the expected direction, which was, however, not statistically significant ( $ \hat{\beta} $ = .019, p = .121, Model 3c).

Figure 2. Results of the multilevel regression analyses predicting willingness to share health data: Additional measures and interaction effects. All models (2a–3c) include main effects of the vignette dimensions (not shown). Models 2a–3a display the results for various trust measures. Model 3b displays the results of altruism and Model 3c for public duty. The dots show the respective point estimates, and the bars indicate the 95% confidence intervals.

Exploratory analyses

We also included several variables for additional exploratory analyses shown in Figure 3 (see Table A1 in the Supplementary Materials for descriptive results of these additional variables). With respect to demographics, young respondents (18–28 years) reported a significantly higher willingness to share their health data than respondents aged 29 to 64 years (p < .05, Model 4a). The effects of educational attainment and gender were statistically nonsignificant (p > .05). Respondents who owned a smartwatch ( $ \hat{\beta} $ = .300, p = .024) and/or a smartphone ( $ \hat{\beta} $ = .505, p = .022) and respondents with higher levels of technical affinity ( $ \hat{\beta} $ = .038, p < .001) reported a significantly higher willingness to share their data than respondents who did not own either of these devices (Model 4b). Respondents with higher levels of trust in others (i.e., social trust, $ \hat{\beta} $ = .147, p < .001, Model 4c) and respondents who have been confronted with cancer personally or in their close social environment reported a significantly higher willingness to share their health data ( $ \hat{\beta} $ = .271, p = .019, Model 4d). In contrast, respondents with higher privacy concerns reported a significantly lower willingness to share their health data ( $ \hat{\beta} $ = -.267, p < .001). Self-reported political ideology did not affect respondents’ willingness to share their data ( $ \hat{\beta} $ = –.018, p = .509).

Figure 3. Results of the multilevel regression analyses predicting willingness to share health data: Exploratory analyses. All models (4a–4d) include main effects of the vignette dimensions (not shown). Model 4a displays the results for the demographic variables. Model 4b displays the results of device ownership and technical affinity. Model 4c displays the effects for political ideology and social trust. Model 4d displays the results for cancer exposure and privacy concerns. The dots show the respective point estimates, and the bars indicate the 95% confidence intervals.

Discussion

Summary of results

The results of the vignette experiment confirmed that all three dimensions experimentally tested in our vignette study (data type, recipient, and purpose) significantly influenced individual data sharing decisions. However, the effects of two of the three main effects of vignette dimensions were statistically significant in the opposite direction than hypothesized. Specifically, of our main effects hypotheses, only hypothesis H2.1 regarding the effect of the different recipients on respondents’ data sharing intentions was supported, as university researcher centers were the most accepted recipients. Yet, the hypotheses about interaction effects between the vignette dimensions were not supported. From a CI perspective, this finding is somewhat striking, as we would have expected the effects of single parameters to depend on the specification of the other parameters. One explanation is that the specific data sharing scenarios that we investigated come with similar privacy expectations once they are placed within the respective health contexts. In contrast, most of our hypotheses about interactions with additional measures were supported (e.g., public purpose and altruism), and most of our exploratory analyses showed statistically significant effects (e.g., social trust and privacy concerns). The latter results indicate that general attitudes and characteristics of respondents indeed influenced their willingness to share across scenarios.

With respect to the different data types, our study found that respondents reported higher willingness to share biomarkers and medical records compared with sensor data for health research, which echoes the finding of Beuthner et al. (Reference Beuthner, Silber and Stark2022). A possible reason for this finding is that the threat of out-of-context use for sensor data appeared to be more salient than for the other two data types (Vitak & Zimmer, Reference Vitak and Zimmer2020). Another reason is the hypothetical nature of the outcome variable of our study: respondents may have not considered the higher data sharing effort for biomarkers compared with sensor data.

Our study did not reproduce the result of Gerdon et al. (Reference Gerdon, Nissenbaum, Bach, Kreuter and Zins2021) that respondents were more willing to share their data with a private than with a public recipient. Possible reasons are that we referred to more specific public institutions than Gerdon et al. (Reference Gerdon, Nissenbaum, Bach, Kreuter and Zins2021) and that public trust levels toward public authorities changed during the pandemic. While the willingness to share was the highest for university research centers, respondents were also more likely to be willing to share their data with a public health agency compared with a private company. This finding reinsures confidence in publicly funded health research. However, for data related to current crises or data directly linked to concerns of government surveillance, the findings might be different. Additional research is needed to explore this further.

With respect to the purpose of the data collection, the study showed that respondents were more likely to be willing to share their data in case of a public benefit compared with a personal health recommendation. This finding confirms previous research suggesting that sharing health data in the interest of improving public health aligns with societal norms and is, therefore, highly accepted (Bearth & Siegrist, Reference Bearth and Siegrist2020; Waind, Reference Waind2020). However, our findings do not support the assumption drawn from the privacy calculus (Culnan & Armstrong, Reference Culnan and Armstrong1999), which would have suggested that individuals are more likely to share their data if they expect personal (health) benefits.

Practical implications

Our study illustrated that willingness to share health data is closely connected to individual variables such as institutional and social trust, privacy concerns, altruism, technical affinity, and age. Building on this information, invitation letters to protentional study participants could illustrate the trustworthiness of the respective data recipient and the purpose of the data collection. More generally, and in line with previous research (e.g., Aitken et al., Reference Aitken, Jorre, Pagliari, Jepson and Cunningham-Burley2016; Rosman et al., Reference Rosman, Bosnjak, Silber, Koßmann and Heycke2022; Waind, Reference Waind2020), the findings underline that health research needs to clearly show that it serves public interest to achieve public acceptance. In the invitation letter, researchers should also make sure to address study-specific privacy concerns regarding data collection, storage, and processing. Beyond that, the study suggested that a private company or public health agency, which plans to run a data sharing campaign, may likely increase the trustworthiness of their projects by involving independent university researchers. Finally, the more an institution knows about the data sharing norms, preferences, and privacy concerns of the target population, the more it can tailor the design of the health data collection.

Researchers who are interested in estimating how many participants they need for their study are advised to be mindful that a data sharing process has several steps. In this study, respondents first had to follow the invitation to take part in the survey. They then had to complete the entire survey and provide answers of sufficient quality (e.g., without speeding through the questionnaire). In actual health data collections, individuals would have to answer the request for sharing additional health data affirmatively and complete that data sharing procedure successfully. Yet, for the generalizability of a study, it is not merely important how many people are willing to share their data; it is as critical whether there are specific subgroups of invited persons who are not willing to share their health data (or take part in the survey). For example, if a study is focused on vaccinations against COVID-19 and the realized health data sample only includes people who had at least three vaccinations, important subgroups of the population would be missing, and the generalizability of the study would be limited in that respect. Thus, researchers should always consider both aspects simultaneously, optimizing participation and minimizing sample bias.

Limitations

This research has several limitations. First, we use cancer research as our study topic. While cancer research is less affected by current events than other health research topics, such as the COVID-19 pandemic, it remains an open question to what degree our findings will generalize to other health topics. Cancer research might be perceived as more important than less severe diseases, so that we expect lower data sharing rates for those topics. Second, our study was carried out during the COVID-19 pandemic, when sharing health data might be generally viewed more positively than during times when personal and public health are less salient topics. Third, one might wonder whether our findings will generalize to other countries. While this is again a question for future investigations, research has shown that privacy concerns and related behavior may differ across countries (e.g., Li, Reference Li, Knijnenburg, Page, Wisniewski, Lipford, Proferes and Romano2022; Trepte et al., Reference Trepte, Reinecke, Ellison, Quiring, Yao and Ziegele2017). Moreover, the digitalization of the health system in Germany is not considered very advanced (Bertelsmann Stiftung, 2019). Thus, willingness to share health data may be higher in countries with fewer privacy concerns and/or a higher level of digitalization of the health system. Fourth, our vignette experiment only captures people’s intent to share health data. While this approach allows us to experimentally manipulate several factors at once, it negatively influences the external validity of our study. However, previous research has shown that there is a strong association between intended and actual behavior (e.g., Hainmueller et al., Reference Hainmueller, Hangartner and Yamamoto2015; Petzold & Wolbring, Reference Petzold and Wolbring2018; Sheeran, Reference Sheeran2002), so that we believe that most of our main findings will be directly transferable to “real-world” data sharing situations. An advantage of our hypothetical study is that the results will not be influenced by the specific data sharing method, which can have a large impact on the results (Silber et al., Reference Silber, Breuer, Beuthner, Gummer, Keusch, Siegers, Stier and Weiss2021). Maybe most importantly, researchers should expect substantially lower data sharing rates in studies in which actual data is requested, because the costs for respondents are higher since they have to share their data.Footnote ¹⁵ Another aspect that could reduce the data sharing rates in studies that measure actual sharing behavior is that following the request and providing data appears to be socially desirable. Given the lower costs of the hypothetical situation, more people might tend to answer the request affirmatively. Finally, our study uses a nonprobability sample. While prior research has shown that multivariate relationships obtained from such surveys often generalize to the general population, univariate distributions and bivariate associations should be treated with the appropriate caution (Cornesse et al., Reference Cornesse, Blom, Dutwin, Krosnick, De Leeuw, Legleye, Pasek, Pennay, Phillips, Sakshaug, Struminskaya and Wenz2020). However, our study focuses on uncovering multivariate and causal relationships.

Conclusion

Our vignette study showed that the willingness to share health data is highly dependent on the specific data sharing situation. All three vignette dimensions (data type, recipient, and research purpose) significantly affected respondents’ willingness to share their data. Similarly, the additional variables measuring trust, privacy, age, and device ownership affected the reported willingness to share health data. However, we found no meaningful interaction effects between the vignette dimensions. From a CI perspective, this raises questions on the similarity of social norms of data sharing scenarios within specific health contexts. The results suggest that individual data sharing decisions are affected by a multitude of factors, which include the idiosyncrasies of a data sharing situation as well as individual variables. Thus, since data sharing decisions are embedded in complex social contexts, we need to ensure that study design, research infrastructure, and public communication of science, as well as invitations to participate in studies, create a trustworthy environment and aim to foster public benefits.

Supplementary Materials

To view supplementary material for this article, please visit http://doi.org/10.1017/pls.2022.15.

Data availability statement

This article earned Open Materials, Open Data, and Preregistration badges for open scientific practices. The materials, data, and preregistration that support the findings of this study and the award of these badges are openly available at https://doi.org/10.23668/psycharchives.7058 (data and codebook), https://osf.io/p6h7j/ (analyses code in R), and https://osf.io/kgwe7 (preregistration report).

Appendix: Overview of additional measures

The questionnaire was administered in German language.

Cancer exposure

Source: own

Have you, a relative, or a close friend ever been diagnosed with cancer?

• Yes
• No
• I prefer not to say

Device ownership

Source: own

Do you own one or more of the following devices? Please tick all that apply.

▪ A desktop computer / PC
▪ A laptop / notebook
▪ A smartphone
▪ A tablet
▪ A smartwatch
▪ No, none of these devices

Technical affinity

Source: Subscale “General” of the ICT Self-Concept Scale (Schauffel et al., Reference Schauffel, Schmidt, Peiffer and Ellwart 2021 ). Licensed under a CC BY 4.0 International License.

In the following, you will be asked questions about the handling of digital systems. Digital systems are all digital applications (e.g., software or apps) and all digital devices (e.g., computers or smartphones).

I can operate digital systems.

• Strongly disagree
• Disagree
• Slightly disagree
• Slightly agree
• Agree
• Strongly agree

I am good at using digital systems.

• Strongly disagree
• Disagree
• Slightly disagree
• Slightly agree
• Agree
• Strongly agree

I quickly learn when it comes to using digital systems.

• Strongly disagree
• Disagree
• Slightly disagree
• Slightly agree
• Agree
• Strongly agree

It is easy for me to get familiar with new digital systems.

• Strongly disagree
• Disagree
• Slightly disagree
• Slightly agree
• Agree
• Strongly agree

I have always been good at using digital systems.

• Strongly disagree
• Disagree
• Slightly disagree
• Slightly agree
• Agree
• Strongly agree

Political ideology

Source: ESS Round 9: European Social Survey (2021). Licensed under a CC BY-SA 4.0 International License.

In politics people sometimes talk of “left” and “right”. Where would you place yourself on this scale, where 0 means the left and 10 means the right?

• 0 – Left
• 1
• 2
• 3
• 4
• 5
• 6
• 7
• 8
• 9
• 10 – Right

Public duties

Source: ESS Round 1: European Social Survey (2018). Licensed under a CC BY-SA 4.0 International License.

To be a good citizen, how important would you say it is for a person to…

…support people who are worse off than themselves?

• 0 – Extremely unimportant
• 1
• 2
• 3
• 4
• 5
• 6
• 7
• 8
• 9
• 10 – Extremely important

…vote in elections?

• 0 – Extremely unimportant
• 1
• 2
• 3
• 4
• 5
• 6
• 7
• 8
• 9
• 10 – Extremely important

…always obey laws and regulations?

• 0 – Extremely unimportant
• 1
• 2
• 3
• 4
• 5
• 6
• 7
• 8
• 9
• 10 – Extremely important

Social trust

Source: ESS Round 9: European Social Survey (2021). Licensed under a CC BY-SA 4.0 International License.

In general, do you think that most people can be trusted, or that you can’t be careful enough when dealing with other people?

• 0 – You can never be too careful
• 1
• 2
• 3
• 4
• 5
• 6
• 7
• 8
• 9
• 10 – Most people can be trusted

Institutional trust

Source (based on): ESS Round 9: European Social Survey (2021). Licensed under a CC BY-SA 4.0 International License.

To what extent do you trust public health agencies in general?

• 0 – No trust at all
• 1
• 2
• 3
• 4
• 5
• 6
• 7
• 8
• 9
• 10 – Complete trust

To what extent do you trust private companies in general?

• 0 – No trust at all
• 1
• 2
• 3
• 4
• 5
• 6
• 7
• 8
• 9
• 10 – Complete trust

To what extent do you trust university researchers in general?

• 0 – No trust at all
• 1
• 2
• 3
• 4
• 5
• 6
• 7
• 8
• 9
• 10 – Complete trust

To what extent do you trust the scientific community in general?

• 0 – No trust at all
• 1
• 2
• 3
• 4
• 5
• 6
• 7
• 8
• 9
• 10 – Complete trust

Altruism

Source: SOEP-IS Group ( 2021 ). Licensed under a CC BY-SA 4.0 International License.

Now we would like to know how well the following statement describes you as a person.

I am willing to do something for a good purpose without expecting anything in return.

• 0 – Does not describe me at all
• 1
• 2
• 3
• 4
• 5
• 6
• 7
• 8
• 9
• 10 – Describes me perfectly

Footnotes

This article earned Open Materials, Open Data, and Preregistration badges for open scientific practices. For details, see the Open Scientific Practices Statement.

¹ Preserving privacy and security is critical when digital data are shared because these data are exposed to threats during transmission. Thus, it is best practice to encrypt messages and files while they are being transmitted.

² The Biomarker Working Group of the US Food and Drug Administration and the National Health Institute defined a biomarker as a “characteristic that is measured as an indicator of normal biological processes, pathogenic processes, or biological responses to an exposure or intervention, including therapeutic interventions” (FDA-MH Biomarker Working Group, 2021, p. 45). Biomarkers can be measured using, for example, blood, urine, or soft tissue (Hirsch & Watkins, Reference Hirsch and Watkins2020).

³ We note that research from the United States shows that trust in science is not increasing among all population groups. Some groups even showed increasing levels of anti-intellectualism (e.g., Motta, Reference Motta2018). This is likely to generalize to other countries as well. We also note that Germany does not deviate significantly from the international scientific trust average, whereas the United States is below the international mean (Huber et al., Reference Huber, Barnidge, de Zúñiga and Liu2019).

⁴ The preregistered report is available at https://osf.io/kgwe7.

⁵ The effect size was based on the general recommendation from Brysbaert and Stevens (Reference Brysbaert and Stevens2018) and Kühberger et al. (Reference Kühberger, Fritz and Scherndl2014). We reduced the suggested number from 0.3 to 0.1 to account for the nature of our hypotheses, which feature multiple interaction effects.

⁶ The sentence “This public health agency [private company; university research center] uses these data for a research program to fight cancer” represents the public benefit, and the sentence “This public health agency [private company, university research center] uses these data to provide the persons with personal recommendations on their health behavior with respect to protection against cancer” represents the personal benefit. Each vignette includes only one of these two research purposes.

⁷ The questionnaire was implemented in German. Most questions were taken from German scales and translated into English by the authors. New questions without a German version were translated by the authors.

⁸ The factor analyses for the two multi-item constructs “public duty” and “technical affinity” showed that while all items for technically affinity had a factor loading above 0.5, one item measuring public duty (“To be a good citizen, how important would you say it is for a person to support people who are worse off than themselves?”) had a factor score of 0.40, so that this item was not included when building the sum score.

⁹ Only one respondent selected “diverse” in the gender category. To ensure the privacy of that person, we removed the respondent from the sample before the analyses and publication of the data set. However, we repeated all analyses, including this respondent, to make sure that the results presented in the manuscript were not affected by this decision.

¹⁰ After the preregistration, the step of excluding respondents who did not complete the full questionnaire was moved from the data quality check section to the data section without changing the procedure of excluding incomplete interviews.

¹¹ A benchmark from a previous study with a similar respondent pool from which we have drawn our sample showed that about 7 percent of the sample was identified as speeders (Roßmann, Reference Roßmann2017). However, if we had experienced an unusually high number of speeders (i.e., more than 15 percent), we would have increased the sample size and not have relied on imputation. Similarly, if we had excluded a large number of respondents because of breakoffs or item nonresponse, we would have increased the sample size of the study to achieve the minimum sample size. In addition, we asked respondents at the beginning of the questionnaire to carefully read and answer the questions to mitigate speeding (Conrad et al., Reference Conrad, Couper, Tourangeau and Zhang2017).

¹² See https://doi.org/10.23668/psycharchives.7058 (Silber et al., Reference Silber, Gerdon, Bach, Kern, Keusch and Kreuter2022) and https://osf.io/p6h7j/.

¹³ Available at https://osf.io/p6h7j/.

¹⁴ The tests of the experimental assignment included age, gender, and education. As an additional sensitivity check, we recalculated all models without including education to ensure that including it did not affect our substantive conclusions.

¹⁵ At the same time, the benefits are usually also higher since respondents often receive monetary incentives for their data sharing effort. This can help counterbalance the additional data sharing effort.

References

Aitken, M., Jorre, J. D. S., Pagliari, C., Jepson, R., & Cunningham-Burley, S. (2016). Public responses to the sharing and linkage of health data for research purposes: A systematic review and thematic synthesis of qualitative studies. BMC Medical Ethics, 17(1), 1–24.CrossRef Google Scholar

Aitken, M., McAteer, G., Davidson, S., Frostick, C., & Cunningham-Burley, S. (2018). Public preferences regarding data linkage for Health Research: a discrete choice experiment. International Journal of Population Data Science, 3(1), 429.CrossRef Google Scholar PubMed

Auspurg, K., & Hinz, T. (2015). Factorial survey experiments. Sage.CrossRef Google Scholar

Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48 (R package version 1.1-27).CrossRef Google Scholar

Bauer, P. C., Keusch, F., & Kreuter, F. (2019). Trust and cooperative behavior: Evidence from the realm of data-sharing. PLOS ONE, 14(8), e0220115.Google Scholar PubMed

Bearth, A., & Siegrist, M. (2020). Psychological factors that determine people’s willingness-to-share genetic data for research. Clinical Genetics, 97(3), 483–491.CrossRef Google Scholar PubMed

Bertelsmann Stiftung. (2019, September 4). Smart health systems: International comparison of digital strategies. https://www.bertelsmann-stiftung.de/en/publications/publication/did/summary-smarthealthsystems-focus-europe Google Scholar

Beuthner, C., Keusch, F., Silber, H., Weiß, B., & Schröder, J. (2022). Consent to data linkage for different data domains: The role of question order, question wording, and incentives. SocArXiv. https://doi.org/10.31235/osf.io/qh93g CrossRef Google Scholar

Beuthner, C., Silber, H., & Stark, T. H. (2022). Effects of smartphone use and recall aids on network name generator questions. Social Networks, 69, 45–54.Google Scholar

Bolker, B., & Robinson, D. (2021). broom.mixed: Tidying methods for mixed models (R package version 0.2.7).Google Scholar

Brysbaert, M., & Stevens, M. (2018). Power analysis and effect size in mixed effects models: A tutorial. Journal of Cognition, 1(1), 9.Google Scholar PubMed

Christensen, R. H. B. (2019). ordinal: Regression models for ordinal data (R package version 2019.12-10).Google Scholar

Comtois, D. (2022). summarytools: Tools to quickly and neatly summarize data (R package version 1.0.1).Google Scholar

Conrad, F. G., Couper, M. P., Tourangeau, R., & Zhang, C. (2017). Reducing speeding in web surveys by providing immediate feedback. Survey Research Methods, 11(1), 45–61.Google Scholar PubMed

Cornesse, C., Blom, A. G., Dutwin, D., Krosnick, J. A., De Leeuw, E. D., Legleye, S., Pasek, J., Pennay, D., Phillips, B., Sakshaug, J. W., Struminskaya, B., & Wenz, A. (2020). A review of conceptual approaches and empirical evidence on probability and nonprobability sample survey research. Journal of Survey Statistics and Methodology, 8(1), 4–36.Google Scholar

Croissant, Y., & Millo, G. (2018). Panel data econometrics with R: The plm package. Wiley.CrossRef Google Scholar

Culnan, M. J., & Armstrong, P. K. (1999). Information privacy concerns, procedural fairness, and impersonal trust: An empirical investigation. Organization Science, 10(1), 104–115.CrossRef Google Scholar

Daikeler, J., Bach, R. L., Silber, H., & Eckman, S. (2022). Motivated misreporting in smartphone surveys. Social Science Computer Review, 40(1), 95–107.CrossRef Google Scholar

Davis, F. D., Bagozzi, R. P., & Warshaw, P. R. (1989). User acceptance of computer technology: A comparison of two theoretical models. Management Science, 35(8), 982–1003.CrossRef Google Scholar

ESS Round 1: European Social Survey. (2018). ESS-1 2002 documentation report (Edition 6.6). European Social Survey Data Archive, Norwegian Centre for Research Data for ESS ERIC. https://doi.org/10.21338/NSD-ESS1-2002 CrossRef Google Scholar

ESS Round 9: European Social Survey. (2021). ESS-9 2018 documentation report (Edition 3.1). European Social Survey Data Archive, Norwegian Centre for Research Data for ESS ERIC. https://doi.org/10.21338/NSD-ESS9-2018 Google Scholar

Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175–191.Google Scholar PubMed

FDA-NIH Biomarker Working Group. (2021). BEST (Biomarkers, EndpointS, and other Tools). U.S. Food and Drug Administration. https://www.ncbi.nlm.nih.gov/books/NBK326791.Google Scholar

Funk, C., Hefferon, M., Kennedy, B., & Johnson, C. (2019, August 2). Trust and mistrust in Americans’ views of scientific experts. Pew Research Center. https://www.pewresearch.org/science/2019/08/02/trust-and-mistrust-in-americans-views-of-scientific-experts/Google Scholar

Gerdon, F., Nissenbaum, H., Bach, R. L., Kreuter, F., & Zins, S. (2021). Individual acceptance of using health data for private and public benefit: Changes during the COVID-19 pandemic. Harvard Data Science Review (Special issue 1). https://doi.org/10.1162/99608f92.edf2fc97 CrossRef Google Scholar

Gilbert, S., Vitak, J., & Shilton, K. (2021). Measuring Americans’ comfort with research uses of their social media data. Social Media + Society, 7(3), 20563051211033824.Google Scholar

Gohel, D. (2022a). flextable: Functions for Tabular Reporting (R package).Google Scholar

Gohel, D. (2022b). officer: Manipulation of Microsoft Word and PowerPoint documents (R package).Google Scholar

Grund, S., Robitzsch, A., & Luedtke, O. (2021). mitml: Tools for multiple imputation in multilevel modeling (R package version 0.4-1).Google Scholar

Hainmueller, J., Hangartner, D., & Yamamoto, T. (2015). Validating vignette and conjoint survey experiments against real-world behavior. Proceedings of the National Academy of Sciences, 112(8), 2395–2400.Google Scholar PubMed

Harrell, F. E. Jr. (2021). hmisc: Harrell miscellaneous (R package version 4.5-0).Google Scholar

Hirsch, M. S., & Watkins, J. (2020). A comprehensive review of biomarker use in the gynecologic tract including differential diagnoses and diagnostic pitfalls. Advances in Anatomic Pathology, 27(3), 164–192.CrossRef Google Scholar PubMed

Hlavac, M. (2018). stargazer: Well-formatted regression and summary statistics tables (R package version 5.2.1).Google Scholar

Huber, B., Barnidge, M., de Zúñiga, H. G., & Liu, J. (2019). Fostering public trust in science: The role of social media. Public Understanding of Science, 28(7), 759–777.CrossRef Google Scholar PubMed

Husedzinovic, A., Ose, D., Schickhardt, C., Fröhling, S., & Winkler, E. C. (2015). Stakeholders’ perspectives on biobank-based genomic research: systematic review of the literature. European Journal of Human Genetics, 23(12), 1607–1614.Google Scholar PubMed

Hutchings, E., Loomes, M., Butow, P., & Boyle, F. M. (2020). A systematic literature review of health consumer attitudes towards secondary use and sharing of health administrative and clinical trial data: a focus on privacy, trust, and transparency. Systematic Reviews, 9(1), 1–41.Google Scholar

Jones, K. H., Daniels, H., Squires, E., & Ford, D. V. (2019). Public views on models for accessing genomic and health data for research: mixed methods study. Journal of Medical Internet Research, 21(8), e14384.CrossRef Google Scholar PubMed

Juga, J., Juntunen, J., & Koivumäki, T. (2021). Willingness to share personal health information: Impact of attitudes, trust and control. Records Management Journal, 31(1), 48–59.Google Scholar

Karampela, M., Ouhbi, S., & Isomursu, M. (2019). Connected health user willingness to share personal health data: questionnaire study. Journal of Medical Internet Research, 21(11), e14537.CrossRef Google Scholar PubMed

Kehr, F., Kowatsch, T., Wentzel, D., & Fleisch, E. (2015). Blissfully ignorant: the effects of general privacy concerns, general institutional trust, and affect in the privacy calculus. Information Systems Journal, 25(6), 607–635.CrossRef Google Scholar

Kim, Y., & Stanton, J. M. (2016). Institutional and individual factors affecting scientists’ data-sharing behaviors: A multilevel analysis. Journal of the Association for Information Science and Technology, 67(4), 776–799.CrossRef Google Scholar

Klingwort, J., & Schnell, R. (2020). Critical limitations of digital epidemiology. Survey Research Methods, 14(2), 95–101.Google Scholar

Krause, N. M., Brossard, D., Scheufele, D. A., Xenos, M. A., & Franke, K. (2019). Trends—Americans’ trust in science and scientists. Public Opinion Quarterly, 83(4), 817–836.Google Scholar

Kühberger, A., Fritz, A., & Scherndl, T. (2014). Publication bias in psychology: A diagnosis based on the correlation between effect size and sample size. PLOS ONE, 9(9), e105825.CrossRef Google Scholar PubMed

Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82(13), 1–26 (R package version 3.1-3).CrossRef Google Scholar

Lacasse, A., Gagnon, V., Nguena Nguefack, H. L., Gosselin, M., Pagé, M. G., Blais, L., & Guénette, L. (2021). Chronic pain patients’ willingness to share personal identifiers on the web for the linkage of medico-administrative claims and patient-reported data: The chronic pain treatment cohort. Pharmacoepidemiology and Drug Safety, 30(8), 1012–1026.Google Scholar PubMed

Leifeld, P. (2013). texreg: Conversion of statistical model output in R to LaTeX and HTML tables. Journal of Statistical Software, 55(8), 1–24.Google Scholar

Li, Y. (2022). Cross-cultural privacy differences. In Knijnenburg, B. P., Page, X., Wisniewski, P., Lipford, H. Richter, Proferes, N., & Romano, J. (Eds.), Modern socio-technical perspectives on privacy (pp. 267–292). Springer.CrossRef Google Scholar

Martin, K., & Nissenbaum, H. (2017). Measuring privacy: An empirical test using context to expose confounding variables. The Columbia Science & Technology Law Review, 18, 176–218.Google Scholar

Martin, K., & Shilton, K. (2016). Putting mobile application privacy in context: An empirical study of user privacy expectations for mobile devices. The Information Society, 32(3), 200–216.CrossRef Google Scholar

Mello, M. M., Lieou, V., & Goodman, S. N. (2018). Clinical trial participants’ views of the risks and benefits of data sharing. New England Journal of Medicine, 378(23), 2202–2211.CrossRef Google Scholar PubMed

Motta, M. (2018). The dynamics and political implications of anti-intellectualism in the United States. American Politics Research, 46(3), 465–498.CrossRef Google Scholar

Nissenbaum, H. (2010). Privacy in context: technology, policy, and the integrity of social life. Stanford University Press.Google Scholar

Nissenbaum, H. (2019). Contextual integrity up and down the data food chain. Theoretical Inquiries in Law, 20(1), 221–256.CrossRef Google Scholar

Oberski, D. L., & Kreuter, F. (2020). Differential privacy and social science: An urgent puzzle. Harvard Data Science Review, 2(1), 1–21.Google Scholar

Petzold, K., & Wolbring, T. (2018). What can we learn from factorial surveys about human behavior? Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 15(1), 19–30.Google Scholar

R Core Team. (2020). R: A language and environment for statistical computing (Version 3.6.3). R Foundation for Statistical Computing. https://www.R-project.org/Google Scholar

Revelle, W. (2021). psych: Procedures for personality and psychological research (R package version 2.1.6). Northwestern UniversityGoogle Scholar

Rosman, T., Bosnjak, M., Silber, H., Koßmann, J., & Heycke, T. (2022). Open science and public trust in science: Results from two studies. Public Understanding of Science. Advance online access. https://doi.org/10.1177/09636625221100686 Google Scholar

Roßmann, J. (2010). Data quality in web surveys of the German Longitudinal Election Study 2009 [Paper presentation]. 3rd ECPR Graduate Conference, Dublin, Ireland.Google Scholar

Roßmann, J. (2017). Satisficing in Befragungen. Springer Fachmedien.Google Scholar

Schauffel, N., Schmidt, I., Peiffer, H., & Ellwart, T. (2021). Self-concept related to information and communication technology: Scale development and validation. Computers in Human Behavior Reports, 4, 1100149.Google Scholar

Schaurer, I., & Weiß, B. (2020). Investigating selection bias of online surveys on coronavirus-related behavioral outcomes. Survey Research Methods, 14(2), 103–108.Google Scholar

Schloerke, B., Cook, D., Larmarange, J., Briatte, F., Marbach, M., Thoen, E., Elberg, A., & Crowley, J. (2021). GGally: Extension to “ggplot2” (R package version 2.1.2).Google Scholar

Sheeran, P. (2002). Intention–behavior relations: A conceptual and empirical review. European Review of Social Psychology, 12(1), 1–36.CrossRef Google Scholar

Silber, H., Breuer, J., Beuthner, C., Gummer, T., Keusch, F., Siegers, P., Stier, S., & Weiss, B. (2021). Linking surveys and digital trace data: Insights from two studies on determinants of data sharing behavior. SocArXiv. https://osf.io/preprints/socarxiv/dz93u/CrossRef Google Scholar

Silber, H., Gerdon, F., Bach, R., Kern, C., Keusch, F., & Kreuter, F. (2022). Dataset and codebook for “A pre-registered vignette experiment on determinants of health data sharing behavior: Willingness to donate sensor data, medical records, and biomarkers” [Data set]. PsychArchives. https://doi.org/10.23668/psycharchives.705 CrossRef Google Scholar

Silber, H., Schröder, J., Struminskaya, B., Stocké, V., & Bosnjak, M. (2019). Does panel conditioning affect data quality in ego-centered social network questions? Social Networks, 56, 45–54.CrossRef Google Scholar

Smith, H. J., Dinev, T., & Xu, H. (2011). Information privacy research: an interdisciplinary review. MIS Quarterly, 35(4), 989–1015.CrossRef Google Scholar

Stekhoven, D. J. (2022). missForest: Nonparametric missing value imputation using random forest (R package version 1.5).Google Scholar

SOEP-IS Group. (2021). SOEP-IS 2018—Fragebogen für die SOEP-Innovations-Stichprobe (Update release 2019) (SOEP Survey Papers 946: Series A—Survey Instruments). DIW Berlin/SOEP.Google Scholar

Stockdale, J., Cassell, J., & Ford, E. (2018). Giving something back: A systematic review and ethical enquiry into public views on the use of patient data for research in the United Kingdom and the Republic of Ireland. Welcome Open Research, 3, 6.CrossRef Google Scholar

Struminskaya, B., Lugtig, P., Toepoel, V., Schouten, B., Giesen, D., & Dolmans, R. (2021). Sharing data collected with smartphone sensors: Willingness, participation, and nonparticipation bias. Public Opinion Quarterly, 85(S1), 423–462.CrossRef Google Scholar PubMed

Trepte, S., Reinecke, L., Ellison, N. B., Quiring, O., Yao, M. Z., & Ziegele, M. (2017). A cross-cultural perspective on the privacy calculus. Social Media + Society, 3(1). https://doi.org/10.1177/2056305116688035 Google Scholar

Turow, J., & Hennessy, M. (2007). Internet privacy and institutional trust: insights from a national survey. New Media & Society, 9(2), 300–318.CrossRef Google Scholar

Vitak, J., & Zimmer, M. (2020). More than just privacy: Using contextual integrity to evaluate the long-term risks from COVID-19 surveillance technologies. Social Media + Society, 6(3). https://doi.org/10.1177/2056305120948250 CrossRef Google Scholar PubMed

Voigt, T. H., Holtz, V., Niemiec, E., Howard, H. C., Middleton, A., & Prainsack, B. (2020). Willingness to donate genomic and other medical data: results from Germany. European Journal of Human Genetics, 28(8), 1000–1009.CrossRef Google Scholar PubMed

Waind, E. (2020). Trust, security and public interest: striking the balance: A narrative review of previous literature on public attitudes towards the sharing, linking and use of administrative data for research. International Journal of Population Data Science, 5(3).CrossRef Google Scholar PubMed

Weitzman, E. R., Kelemen, S., Kaci, L., & Mandl, K. D. (2012). Willingness to share personal health record data for care improvement and public health: a survey of experienced personal health record users. BMC Medical Informatics and Decision Making, 12(1), 1–10.Google Scholar PubMed

Wickham, H., Averick, M., Bryan, J., Chang, W., D’Agostino McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L, Hester, J., Kuhn, M., Lin Pedersen, T., Miller, E., Milton Bache, S., Müller, K., Ooms, J., Robinson, D., Paige Seidel, D., Spinu, V., Takahashi, K., Vaughan, D., Wilke, C., Woo, K., & Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686.Google Scholar

Wiley, J. F. (2020). multilevelTools: Multilevel and mixed effects model diagnostics and effect sizes (R package version 0.1.1).Google Scholar

Xie, Y. (2021). knitr: A general-purpose package for dynamic report generation in R (R package version 1.36).Google Scholar