Hostname: page-component-78c5997874-ndw9j Total loading time: 0 Render date: 2024-11-09T01:19:22.974Z Has data issue: false hasContentIssue false

A Practical Guide to Dealing with Attrition in Political Science Experiments

Published online by Cambridge University Press:  18 September 2023

Adeline Lo*
Affiliation:
Department of Political Science, University of Wisconsin-Madison, Madison, WI, USA
Jonathan Renshon
Affiliation:
Department of Political Science, University of Wisconsin-Madison, Madison, WI, USA
Lotem Bassan-Nygate
Affiliation:
Department of Political Science, University of Wisconsin-Madison, Madison, WI, USA
*
Corresponding author: Adeline Lo; Email: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

Despite admonitions to address attrition in experiments – missingness on Y – alongside best practices designed to encourage transparency, most political science researchers all but ignore it. A quantitative literature search of this journal – where we would expect to find the most conscientious reporting of attrition – shows low rates of discussion of the issue. We suspect that there is confusion on the link between when attrition occurs and the type of validity it threatens when present, and limited connection to and guidance on which estimands are threatened by different attrition patterns. This is all exacerbated by limited tools to identify, investigate, and report patterns attrition. We offer the R package – attritevis – to visualize attrition over time, by intervention, and include a step-by-step guide to identifying and addressing attrition that balances post hoc analytical tools with guidance for revising designs to ameliorate problematic attrition.

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press on behalf of American Political Science Association

The use of experimental methods combined with online samples, already popular in political science, has accelerated in recent years. Yet, even while a cottage industry has developed around subjecting our methods to empirical scrutiny (Brutger et al. Reference Brutger, Kertzer, Renshon, Tingley and Weiss2022; Coppock Reference Coppock2019; Dafoe, Zhang, and Caughey Reference Dafoe, Zhang and Caughey2018; Jerit, Barabas, and Clifford Reference Jerit, Barabas and Clifford2013; Kertzer Reference Kertzer2020; Mullinix et al. Reference Mullinix, Leeper, Druckman and Freese2015; Mummolo and Peterson Reference Mummolo and Peterson2019), the issue of attrition – missingness on the outcome – has been mostly ignored by practitioners (Gerber et al. Reference Gerber, Arceneaux, Boudreau, Dowling, Hillygus, Palfrey, Biggers and Hendry2014; Zhou and Fishbach Reference Zhou and Fishbach2016). Psychology has already faced a reckoning regarding attrition (Zhou and Fishbach Reference Zhou and Fishbach2016, 495), and our reading of the state of the field in experimental political science is that we do not fare much better. This is true for attrition as “missingness on the outcome,” and even more so for an expansive definition that we use here which includes pretreatment respondent drop-off.

While standards of best practices have been set (Gerber et al. Reference Gerber, Arceneaux, Boudreau, Dowling, Hillygus, Palfrey, Biggers and Hendry2014, 97), habits are hard to change: attrition is rarely inspected or discussed, despite the known issues caused if the missingness correlates with potential outcomes of those who drop out of the study (Druckman et al. Reference Druckman, Green, Kuklinski, Lupia, Druckman, Green, Kuklinski and Lupia2011, 19). Helpful advances have been made (e.g., Coppock et al. Reference Coppock, Gerber, Green and Kern2017), but they focus almost exclusively on ex-post solutions such as double sampling or extreme value bounds, which, though valuable, do not help with the issue of easily identifying attrition that results in threats to inference or preventing it from occurring in the first place in the design stage.

Combining the dictum of Fisher to “analyze as you randomize” with the advice of Coppock (Reference Coppock, James and Donald2021) to “visualize as you randomize,” our contribution is to offer experimentalists a way to do both. Specifically, we provide a “holistic approach” to addressing attrition, beginning with a quantitative literature search to illuminate the scope of the problem. Our search of all articles published in Journal of Experimental Political Science yields discouraging results: in the journal where we would most expect to see systematic and transparent discussions of attrition, 60% of the empirical articles published contained no mention at all.

Our contribution centers on an R package – attritevis – that provides diagnostic visualizations and corresponding tests for investigating and addressing attrition. A central output is an “attrition over time” plot that provides a question-by-question, over-time snapshot of an experiment along an axis with the corresponding amount of attrition at each of these moments, across treatment conditions. This is paired with a respondent-level visualization by treatment condition for detailed inspection of patterns of attrition. The package and the guidance in this article are designed around three central questions researchers ought to ask: (1) is there attrition (if so, where)? (2) what kinds of threats to inference are there? If there are threats, (3) what adjustments can be made to account for them, or to preempt them in future studies? Our goal is to provide guidelines and descriptive statistics to help researchers pinpoint the nature and scope of attrition to alter experimental designs and minimize or preempt problems in future studies. In service of that goal, we connect patterns of nonresponse in experiments to more general concerns about internal and external validity, while still focusing on what estimands remain recoverable even when there is heavy attrition.

The scope of the problem

Several patterns are evident from a brief review of the literature. First, attrition can (in theory) pose problems for inference and extant work suggests that (in practice) our fears may be justified, even when considering only reported attrition (presumably lower than actual attrition, Musch and Reips Reference Musch, Reips and Michael2000; Zhou and Fishbach Reference Zhou and Fishbach2016). Second, despite the importance of detecting attrition, it appears as though “ignorance is bliss” for most researchers: in a systematic quantitative literature search within political science, Gerber et al. (Reference Gerber, Arceneaux, Boudreau, Dowling, Hillygus, Palfrey, Biggers and Hendry2014, 88) find that 58% of sampled experimental articles did not report subject size in each treatment group for which there is missing outcome data (see also Mutz and Pemantle Reference Mutz and Pemantle2015, 13 and Zhou and Fishbach Reference Zhou and Fishbach2016, 495). Finally, the options typically suggested to address attrition are inadequate on their own since they typically focus solely on ex-post solutions.

To set the stage and update previous reviews of the problem, we coded every experimental article published in Journal of Experimental Political Science from its inception in 2014 (Volume 1) to 2021 (Volume 8). The resulting population consisted of 131 articles (our unit of analysis). Footnote 1 As the flagship journal for experimental studies of politics (which has published reporting standards recommending best practices for discussing attrition, Gerber et al. Reference Gerber, Arceneaux, Boudreau, Dowling, Hillygus, Palfrey, Biggers and Hendry2014), JEPS is the most likely place to see evidence of scholars taking the problem seriously. Footnote 2

Results of our quantitative literature search – presented graphically in Figure 1 with each square representing a single article – suggest there is considerable room for improvement (and we suspect the patterns we observe would be significantly worse in other journals). First, we find that the modal experimental paper – 60% (78 papers, in gray) – published in JEPS contains no mention of attrition (this worsens when setting aside the 8% of the papers studying attrition directly). Second, of the 40% that do mention attrition, nearly half – 17% of the total – note no attrition in their studies, suggesting the possibility of an adverse selection problem whereby attrition is only mentioned if there is no evidence of it being a problem. Finally, for papers transparent about attrition occurring (33 papers, or 25% of the total), only three clearly analyze and account for attrition. Footnote 3

Figure 1. Experimental papers in full JEPS corpus and their discussion of attrition.

Of course, respondent “cost to attrite” varies across study type: these costs are high in lab experiments, for example, which typically feature zero attrition (some types of field experiments may pose similar costs). To further analyze the data, we distinguished between survey and non-survey (e.g., field or lab) experiments and find that of the 78 papers to not mention attrition and 58 were survey experiments (and only ${{14} \over {78}}$ were lab and ${6 \over {78}}$ were field experiments). Though well-known approaches exist to address the issue, our argument is that attrition is considered less often than it ought to be – even where professional and cultural incentives are strong – and that one significant stumbling block for scholars is a clear, easy way to determine if and when attrition is occurring in their studies.

Current approaches to addressing attrition

Current solutions cluster around two approaches: reducing attrition in the design stage and addressing it post hoc (i.e., analytically). The first camp asks the practical question of what might reduce attrition, testing and evaluating specific ideas such as using different survey modes (Morrison et al. Reference Morrison, Wahlgren, Hovell, Zakarian, Burkham-Kreitner, Hofstetter, Slymen, Keating, Russos and Jones1997), or appealing to respondents’ conscience (Zhou and Fishbach Reference Zhou and Fishbach2016). Another popular approach utilizes monetary incentives (Göritz Reference Göritz, Callegaro, Baker, Jelke Bethlehem, Göritz, Krosnick and Lavrakas2014), with some research (Castiglioni, Pforr, and Krieger Reference Castiglioni, Pforr and Krieger2008) suggesting the utility of conditional incentives. Other work has focused on question length and relevance (McCambridge et al. Reference McCambridge, Eleftheria Kalaitzaki, White, Elizabeth Murray, Thompson, Godfrey and Wallace2011) as well as adding “warm-up” tasks. Warm-up tasks are intended to increase respondents’ “sunk costs” (Horton, Rand, and Zeckhauser Reference Horton, Rand and Zeckhauser2011), and Reips (Reference Reips2000) finds some evidence that they work in online survey contexts.

While the bulk of the research on reducing attrition comes from panel/longitudinal settings (Lynn Reference Lynn2018), it still points to some generalizable lessons – e.g., the utility of incentives and the importance of debriefing questions. However, the fatal flaw in focusing solely on preempting attrition by design is that one never knows how successful the effort has been in the particular context in which it was used. That is, utilizing design choices – whether cash payments or subtle “nudges” – to minimize attrition does not obviate the necessity of having a way to understand when and why attrition is occurring. After all, using incentives to lower attrition from some counterfactual baseline does not solve one’s inferential problems if it still occurs and is causally related to treatments.

Other approaches address attrition ex-post, such as through the use of extreme value bounds (“Manski Bounds,” see Manski Reference Manski1995, Reference Manski2009 and Coppock Reference Coppock, James and Donald2021, 333) or inverse probability weighting (“IPW,” Wooldridge Reference Wooldridge2007), sometimes visualized to highlight the difference between observed data and imputed best/worst-case scenarios (as in Coppock Reference Coppock, James and Donald2021, 333). These bounds assume that all attriters exposed to treatment would have had the highest possible outcome level while all attriters in control would have had the lowest values on the outcome (a related approach requires a stronger assumption that treatment only affects attrition in one direction, Lee Reference Lee2009). Another approach involves double sampling (or “refreshment samples” in panel studies, Deng et al. Reference Deng, Hillygus, Reiter, Si and Zheng2013) and is designed to address attrition through randomized followups among attrited respondents. Recently, some advances suggest combining multiple approaches (Coppock et al. Reference Coppock, Gerber, Green and Kern2017; Gomila and Clark Reference Gomila and Clark2020), allowing researchers to partially identify the ATE of an experiment even when assumptions about missingness in follow-up contact attempts are relaxed. Footnote 4

In sum, attrition represents a significant threat to inference and current approaches to addressing it are either ex-post – requiring strong assumptions in order to estimate treatment effects – or involve implementing design choices that rely on confusing conventional wisdom about what “works.” Moreover, Gomila and Clark (Reference Gomila and Clark2020, 1) highlight a more fundamental problem by noting that one set of solutions (IPW) is suggested for “mild” attrition while another (double-sampling) is suggested for “severe” cases. Our question is: how are experimentalists to know the difference?

Our argument is that whether one wishes to preempt attrition through design choices or address it in the analysis stage, researchers would benefit from a way to understand when and why attrition is occurring in the first place. Our proposed set of solutions, detailed below, aid in the design stage by allowing researchers to pinpoint and design around problematic questions and treatments – heading off attrition in studies before they are fielded – and in the analysis stage by providing an understandable method for understanding when attrition is occurring and when it poses threats to inference.

Diagnosing attrition in your experiment

The typical approach to identifying or addressing attrition in an experiment focuses on its levels. Researchers ask if attrition occurred, or in rarer cases, how much attrition is present, calculating the number of respondents who finished the study as a proportion of the total. We argue that a more informative way to think about attrition in experiments focuses attention on when it occurs and what implications it has for the recovery of causal estimands we are interested in and the assumptions upon which they rest. The following sections outline a series of practical questions and steps researchers can take – facilitated by attritevis – to address attrition, broadly summarized in Figure 2.

Figure 2. Organizing schematic for assessing and handling attrition in an experimental study. Functions from attritevis that can be utilized at each query stage are in pink.

Overview

Our temporal approach enables researchers to visualize the attrition that occurs throughout their study by treatment arm. The first step is to load one’s data, ensuring that columns in the dataframe are ordered by occurrence, and to specify key moments in the study (e.g., treatment and DV, mediators). Footnote 5 Following that, researchers can use plot_attrition to to create the “Attrition timeline” plot, which highlights variation, by intervention arm, in the over-time levels of attrition as well as its relationship to critical moments in the study.

The x-axis in the timeline plot represents all items in the experiment in the order in which they occurred while the y-axis indicates either attrited count (how many respondents attrite at each question) or proportion attrited Footnote 6 . The first outcome is helpful for detecting whether large (absolute) numbers of respondents drop out of the survey at certain moments in time, while the second takes into account the baseline number of respondents who still remain at that point in time.

Is there attrition?

Once plotted, the graphic is “know-it-when-you-see-it:” any amount of attrition (in any of the experimental arms) is clearly visible. Consider four different phases of a relatively straightforward (but popular) family of experimental designs: pretreatment, treatment (including the immediate aftermath of the treatment), outcome measurement, and post-outcome. Our package adds a vertical line demarcating the moment of delivery of treatment to highlight pretreatment and posttreatment periods. Figure 3 presents toy examples of experiments that experience varying levels of attrition at different stages. For example, Figure 3(a) suffers from mild attrition, with very low-level attrition distributed more or less equally across the experiment and across arms – there does not appear to be any particular “choke point” in the design that is generating attrition.

Figure 3. Attrition timeline visualizations: Four toy examples of attrition are presented: (a) low levels of attrition throughout the survey, with little variation across experimental arms; (b) pretreatment attrition, with little variation across arms; (c) attrition right after treatment, with differential attrition across arms; and (d) prolonged posttreatment attrition, with limited variation across arms. We assume treatment in all toy examples is assigned when respondents enter the study and delivered at Q5 (marked with a dark vertical line). The plot_attrition function in attritevis also allows plotting of attrition for all respondents (across all possible treatment groups in the study). This allows users to consider attrition pretreatment, when treatment assignment occurs mid-study. The function further permits users to plot questions by number of responses, rather than attrition, and defaults to gray scale. Users may plot by as many experimental arms as they would like and may specify plot colors.

More broadly, how do researchers know if they have attrition worth investigating further? Put simply: for any study with either zero attrition or negligible levels, it should be sufficient to include the Attrition Timeline plot or a standard table – using table_attrition – that attritevis outputs. This is not as unlikely as it seems, as attrition is an uncommon occurrence in some experimental contexts (e.g., some lab studies). Our hope is that the easy-to-use R package makes it more likely for researchers to report this quantity even when attrition does not represent a threat to inference. Moreover, the “over-time” dimension of both plot and table represent improvements over the current standard in which only total attrition is reported (an improvement that becomes even more meaningful when there is a threat to inference, as discussed below).

Threats to inference and solutions

Experiments are – in an extreme but common stereotype – presumed to deliver clean estimates of causal effects but fall short in allowing scholars to derive lessons applicable “outside the lab.” Indeed, the stated rationale for turning to experiments in the first place is often a concern for internal validity (McDermott Reference McDermott2002, 38). A similar presumed dichotomy is at work in considering the effects of attrition in experiments. Pretreatment missingness implicates the external validity of a study while posttreatment attrition threatens its internal validity: the former because attrition that occurs exclusively pretreatment does not impinge on our ability to randomly assign respondents to treatment arms and achieve probabilistic equivalence, the latter because of the concern that exiting the study is correlated with respondents’ potential outcomes. While this is not a misleading heuristic, it is only a start in thinking through how to address attrition in experiments.

We argue that a complementary lens through which to consider attrition is to focus on patterns of nonresponse and the estimands we can recover – and assumptions we can still plausibly make – in the face of different patterns of attrition. In a simple example, we think it is more helpful to consider what estimands are plausible following pretreatment attrition – the ATE among those that remain, “always reporters” in the typology below – rather than playing down the consequences by simply giving ground on the “generalizability” of the study. Footnote 7

To focus attention on estimands and assumptions – and drawing on the framework in Gerber and Green Reference Gerber and Green2012, chapter 7 – we refer to four (latent) types of respondents, defined with respect to specific post-treatment outcomes (some observed, others not). Let R be an indicator for whether respondents respond for the outcome (and therefore do not attrite on the Y) and values in parentheses denote treatment status (treatment (1) or control (0) arms in a simplified two-arm setting):

  • (a) Never-reporters: $R\left( 1 \right) = 0;R\left( 0 \right) = 0$ or individuals who, regardless of treatment status, always attrite on the outcome;

  • (b) If-treated reporters: $R\left( 1 \right) = 1;R\left( 0 \right) = 0$ or individuals who only report answers to the outcome if given treatment, but otherwise attrite;

  • (c) If-untreated reporters: $R\left( 1 \right) = 0;R\left( 0 \right) = 1$ or respondents who report answers to the outcome if given control, but attrite under the treatment regime. These folks are sometimes ruled out by monotonicity assumptions, ${R_i}\left( 1 \right) {R_i}\left( 0 \right)$ ;

  • (d) Always-reporters: $R\left( 1 \right) = 1;R\left( 0 \right) = 1$ or individuals who always report on the outcome regardless of treatment status;

Critically, the shares of each type listed above influence the estimands we can recover and the assumptions we must make to do so. The problem that confronts applied researchers is that, because of the fundamental problem of causal inference, we cannot necessarily know whether individuals are one type or another. Below, we provide an overview of how to diagnose and address (as well as preempt when possible) different patterns of nonresponse, focusing on whether the attrition is pre- or posttreatment. A visual check of the experiment’s timeline of attrition can help diagnose whether a study suffers from one or both types of attrition. Figure 3(b), for instance, suggests attrition is occurring primarily pretreatment, while (c) and (d) both present as posttreatment attrition cases and (c) displays evidence of imbalanced attrition across arms.

Pre-treatment attrition

Attrition that occurs exclusively pre-treatment can be addressed in several ways depending on the availability of resources and the timeline of the research process. The first step is to diagnose who attrited pre-treatment, paying close attention to whether there was nonrandom selection out of the study. Respondents who drop out here can be seen as “never-responders” and, while recovering the true ATE among those remaining is still possible, scholars typically want to know if remainers are a substantial majority of the starting sample (which can be done via visualizations like plot_attrition to find the proportion attrited) and still reflective of the larger sample population. For the latter inquiry, we can verify how similar remainers look to the sample population: if they look similar on a host of measured demographics, we can more persuasively argue that the ATE recovered extends to the sample population.

Researchers can use statistical tests of differences such as t-tests against a null of the population value (an option in balance_cov function). Footnote 8 Note that this does not reveal which stage of selection – into the study in the first place or attriting out of the study – is responsible for any differences, but it does provide helpful information about the extent to which inferences might generalize outside of the sample (and are useful to the extent that researchers wish to makes those inferences). Researchers can also rely on a host of weighting options, including post-stratification, IPW or raking (see Mercer, Lau, and Kennedy Reference Mercer, Lau and Kennedy2018 for a summary), with their choices depending on contextual constraints such as available population information and the within-correlation of characteristics respondents are weighted on (see Franco et al. Reference Franco, Malhotra, Simonovits and Zigerell2017 for a careful take on reweighting in experiments).

For researchers who can field extra studies, or are analyzing pilot data, the full value of the temporal approach becomes apparent. Pinpointing specific moments where attrition is occurring allows researchers to revise their instruments as necessary to reduce troublesome attrition. Common causes might include overly-long instruments – researchers can shorten the instrument, prepare respondents better or increase compensation – or questions that are aversive, either to most or to a particular subgroup. Footnote 9

Post-treatment attrition

Attrition that occurs at or post-treatment is, for experimentalists, potentially more worrisome as it threatens the core assumption that underlies their causal inferences: specifically that treatment assignment is independent of potential outcomes. If that assumption is not satisfied – if, for example, in a study about the effect of sadness on political beliefs there is high attrition in only one treatment arm because it asks respondents to imagine a negative event that is aversive – then treatment status could be correlated with potential outcomes, threatening internal validity in a general sense and more specifically our ability to recover the estimands that we desire (e.g., ${\rm{AT}}{{\rm{E}}_{{\rm{sample\;\;population}}}})$ . Put differently, the concern is that the potential outcomes of attriting respondents might be different from those of the “remainers.” In these situations, the difference-in-means estimator no longer recovers an unbiased estimate for the ATE (and does not give us the ATE for any meaningful subgroup; see Gerber and Green Reference Gerber and Green2012, 219). If treatment is correlated with potential outcomes, there are a number of steps we can take (detailed below) to either account for the attrition, reduce attrition in the future, bound our estimates, or revise our interpretations.

The first priority is diagnostic and begins with assessing the extent of the damage through visual examination of attrition in the study using our timeline visualization (plot_attrition). In Figure 3(c), there appears to be differential attrition (across treatment arms) around treatment-delivery at Q5. If attrition appears around treatment, the next step is examining whether treatment is correlated with attrition using both the visualization timeline and t-tests of differences at multiple points in the study (using the balance_attrite function, paired with p_adjust to account for multiple tests).

Here, a distinction can be made between the content of the treatment and its delivery. If treatment status is correlated with attrition, one possibility is that something about the delivery of the treatment has caused attrition. For a certain class of experiments – e.g., GOTV studies comparing modes of communication, Gerber and Green Reference Gerber and Green2000 – it is possible that something has gone awry with the delivery (e.g., postage was applied carelessly on some mailers) that does not implicate potential outcomes, leaving the MCAR (missing completely at random) assumption intact. In example (c) we might explore the missingness around the treatment in more detail, visualizing it at the respondent level faceted by treatment arm (Figure 4). Emphasis in this plot is placed on individual (rows, ordered here by respondent ID, assigned at study start) missingness throughout the experiment (columns) and across intervention arms (number of faceted plots). Treatment delivery is represented by the red vertical line (Q5), and within-question and intervention-arm percent missing are calculated. Figure 4 shows that the Treatment group in example (c) suffers from 20.3% missing, while the Control group features nearly half (10.2%) that amount of attrition. On closer inspection, our visualization focuses attentions on problem spots, such as in respondents numbering 50-125, the block of whom all attrite in the treatment, which may point to ad hoc glitches in the delivery of the intervention that occurred during that time.

Figure 4. Visualizing missingness by treatment and control group plot produced using the vis_miss_treat function; the function allows users to facet by conditions to present respondent-level visualization of missingness. Red vertical line marks treatment delivery. This figure demonstrates visualization of a toy example with immediate posttreatment attrition, where treatment caused attrition.

For most experiments in which attrition occurs at or following treatment, we must assume that our assumption of MCAR is in jeopardy and proceed accordingly. Footnote 10 If missingness is correlated with potential outcomes, core assumptions (MCAR) and estimands $\left( {{\rm{AT}}{{\rm{E}}_{{\rm{sample\;\;population}}}}} \right)$ are at risk. There are a number of steps that might follow, depending on where you are in the research process, level of resources, and what information you have collected. Below, we discuss those steps in rough order of preferability.

If research is still in progress, attrition has occurred in a pilot or re-fielding is possible, we suggest researchers utilize debriefing information and focus groups to understand why the attrition is occurring and, if possible, reduce it in the future. Eliciting this kind of information can help with revising treatments to be less aversive, ascertaining whether there are glitches or technical issues in the delivery of the treatment and even figuring out what covariates one might collect to statistically control for the propensity to attrite. Footnote 11

The next practical step is to investigate whether and what covariates predict “selection into attrition” – using balance.cov – to achieve MCAR conditional on X (missing independent of potential outcomes, conditional on a set of observed covariates, X). Footnote 12 As opposed to revising the treatment or the study – a “design-based” approach to reducing attrition – we consider this to be a modeling approach to account for attrition. If researchers can determine which respondent demographics (X’s) can be conditioned on to achieve $MCAR|X$ , reweighting the sample can recover the ATE for the sample population (SATE) (Wooldridge Reference Wooldridge2007). The weights are based on the likelihood of being observed if respondents have certain values of the conditioned covariates and are required because the difference-in-means estimator alone is no longer sufficient to recover the SATE.

Following this, a practical step one might take is to utilize the common robustness check of framing the treatment effect in the context of the attrition by estimating bounds around it (in our package under the bounds function Footnote 13 ). The rationale behind a bounds exercise is to impute missing values associated with best- and worst-case scenarios for the ATE; this will not estimate the ATE for the sample, but does let you know the range your sample ATE could fall in, which can be useful (even if it will in practice often include zero if attrition is significant or your treatment effect is small).

Other approaches to addressing post-treatment attrition are significantly more resource intensive and involve further sampling. Some of these approaches (e.g., Coppock et al. Reference Coppock, Gerber, Green and Kern2017) require planning for double-sampling and are useful if you anticipate high attrition in advance of fielding. Other approaches do not require planning to double-sample – one simply re-contacts attriters and increases the incentives in an effort to convince them to participate – but may present worries about statistical power (Gomila and Clark Reference Gomila and Clark2020). Both of these approaches can present as more practical in the contexts of panel studies.

The solutions noted above are not mutually exclusive. For example, one might use balance.cov to find covariates that predict attrition and then use inverse probability weighting to get closer to MCAR conditional on X and then complement that with bounds as a hedge or robustness check. Those two approaches are particularly suitable to use in concert since one – IPW – relies more on model specifications and the other – bounds – is nonparametric. Another example of combining solutions is to use double sampling along with bounds (Coppock et al. Reference Coppock, Gerber, Green and Kern2017).

A final option is to consider what estimands are still possible if you cannot plausibly achieve $MCAR|X$ . Even with significant imbalanced attrition (correlated with potential outcomes), one can technically estimate the ATE among always-responders (never-attriters). This may be useful if we simply want to have predictions of real-world interventions (and we just care about what the forecasted outcome might look like across the sample pool) or because we care only about outcomes among people who are measured for it. However, a word of caution is in order since if, like the vast majority of experimentalists, researchers are interested in estimating a treatment effect in a survey or lab or field experiment, the ATE among always-responders in the presence of significant attrition is likely to be extremely misleading.

Conclusion

Attrition is an “Achilles heel” (Shadish et al. Reference Shadish, Hu, Glaser, Kownacki and Wong1998, 3) of randomized experiments, threatening if and how researchers can generalize from their data and even their ability to identify causal effects in the first place. A number of helpful methods have been developed over the years to address the problems in the analysis stage, notably re-weighting approaches, bounds and re-contacting attritors. Yet, in a quantitative literature search, we found that researchers typically ignore the issue, failing to account for or even report attrition rates in most published studies.

We argued that at least part of the mismatch between the scope of the problem and the attention paid to it results from lack of usable tools for researchers to diagnose the types of attrition in their studies. Our contribution was to offer a suite of tools in an open-source R package (attritevis) that allow experimentalists to visualize when attrition is occurring. The primary outputs of the package are study-level and unit-level “attrition timelines” of the experiment, enabling researchers to easily answer the critical questions of whether/where attrition is occurring across treatment arms.

To supplement the tools in attritevis, we provided a set of guidelines for applied researchers, focusing on where and among whom the attrition occurs and the implications of the missingness for the estimands we typically desire and the assumptions required to recover them. Our guide balances post-hoc analytical tools with advice for revising designs to ameliorate problematic attrition and a battery of visualization techniques to transparently report attrition. We emphasized the utility of pilot testing and debriefing – used in tandem with the tools provided in our package – to diagnose and then preempt attrition: as is always the case with experiments, the cheapest and most efficient way to fix things is by doing so in the pilot stage. However, regardless of whether researchers are able to lower attrition rates or simply account for them statistically ex-post, we hope and expect the tools we provide to increase the level of transparency surrounding attrition in experiments.

Supplementary material

To view supplementary material for this article, please visit https://doi.org/10.1017/XPS.2023.22

Data availability statement

The data, code, and any additional materials required to replicate all analyses in this article are available at the Journal of Experimental Political Science Dataverse within the Harvard Dataverse Network, at: https://doi.org/10.7910/DVN/O2IAWW. Lo, Adeline Y.; Renshon, Jonathan; Bassan-Nygate, Lotem, 2023, “Replication Data for: A Practical Guide to Dealing with Attrition in Political Science Experiments,” Harvard Dataverse.

Competing interests

None.

Ethics statement

The research was approved by the University of Wisconsin Madison Institutional Review Board Office of Human Research Ethics, study 2020-0843-CP002.

The research adheres to APSA’s Principles and Guidance for Human Subjects Research.

Footnotes

This article has earned badges for transparent research practices: Open Data and Open Materials. For details see the Data Availability Statement.

Data and methods described in this paper can be accessed at https://github.com/lbassan/attritevis, along with a paired vignette. Michael Hatfield provided excellent research assistance, and we thank the feedback from participants at MPSA, APSA, and JEPS anonymous reviewers. All errors remain our own.

1 Many articles contained multiple studies, so using article – not study – in practice translates to relative leniency in our coding: if any studies in a paper discussed attrition, that counts.

2 In fact, current practices at JEPS ask authors to be explicit about attrition.

3 Callen et al. (Reference Callen, Gibson, Jung and Long2016), Boas (Reference Boas2016), Boas (Reference Boas2016) and Green and Zelizer (Reference Green and Zelizer2017) are notable examples that explicitly discuss, quantify and adjust for attrition.

4 Early approaches to double-sampling relied on an assumption of “no missingness among follow-up subjects” to point-identify the ATE. See Hansen and Hurwitz (Reference Hansen and Hurwitz1946), Kaufman and King (Reference Kaufman and King1973) and discussion in Coppock et al. (Reference Coppock, Gerber, Green and Kern2017, 189).

5 Some survey vendors will allow researchers to use their own Qualtrics instrumentation, in which case all the advice below applies. For others, e.g., YouGov, researchers typically receive only “completes.” In these cases, we recommend researchers negotiate ex ante with the company to provide at least basic information regarding subjects who started – but did not complete – studies. In a perfect world, researchers would receive the complete dataset including attritors, but any information provided would go a long ways towards addressing the inferential questions we discuss below.

6 Both are options in plot_attrition, as is missingness by question.

7 Note that this would also correct what we see as a common mistake, which is to equate external validity with generalizing to different pools of respondents rather than any respondents other than those that remain in the study (even those in the same “pool”). E.g., Scholars fielding MTurk studies with significant pre-treatment attrition are not just limited from generalizing to nationally representative samples, but also to the broader MTurk population as well.

8 A similar exercise, depending on available information on drop-outs, can be conducted to compare remainers and drop-outs.

9 Our purpose is to attend to loss of respondents that is unintentional and/or based on choices of the respondents. In some cases, researchers might utilize sampling quotas or attention filters that effectively screen out respondents; these are scenarios where the researcher has chosen to focus on specific types of respondents and – as a result – selectively removes or keeps observations based on whatever criteria they’ve decided upon.

10 High attrition that is convincingly unrelated to potential outcomes is possible, but rare in our experience. This may occur, for instance, if during a lab experiment a bomb threat or smoke alarm occurs that requires quick evacuation of many study participants (experienced in Woon Reference Woon2014).

11 Sometimes, researchers may not have to field again to collect the extra covariates they find themselves needing, if for example it is attainable from a survey company or, as in the case of some elite studies, is information that is publicly available (Kertzer and Renshon, Reference Kertzer and Renshon2022, 539).

12 Practically speaking, whether $MCAR|X$ is satisfied is by assumption; researchers will have to rely on expertise and context to determine if they have found a set of covariates sufficient to achieve this.

13 The bounds function draws from the attrition package Coppock (Reference Coppock2022).

References

Boas, Taylor C. 2016. “Pastors for Pinochet: Authoritarian Stereotypes and Voting for Evangelicals in Chile.” Journal of Experimental Political Science 3(2): 197205.10.1017/XPS.2015.17CrossRefGoogle Scholar
Brutger, Ryan, Kertzer, Josh, Renshon, Jonathan, Tingley, Dustin and Weiss, Chagai. 2022. “Abstraction and Detail in Experimental Design.” American Journal of Political Science.10.1111/ajps.12710CrossRefGoogle Scholar
Callen, Michael, Gibson, Clark C., Jung, Danielle F. and Long, James D.. 2016. “Improving Electoral Integrity with Information and Communications Technology.” Journal of Experimental Political Science 3(1): 417.10.1017/XPS.2015.14CrossRefGoogle Scholar
Castiglioni, Laura, Pforr, Klaus and Krieger, Ulrich. 2008. “The Effect of Incentives on Response Rates and Panel Attrition: Results of a Controlled Experiment.” Survey Research Methods 2(3): 151–8. https://ojs.ub.uni-konstanz.de/srm/article/view/599 Google Scholar
Coppock, Alexander. 2019. “Generalizing from Survey Experiments Conducted on Mechanical Turk: A Replication Approach.” Political Science Research and Methods 7(3): 613–28.10.1017/psrm.2018.10CrossRefGoogle Scholar
Coppock, Alexander. 2021. “Visualize as You Randomize.” In Advances in Experimental Political Science, eds. James, N. Druckman and Donald, P. Green. New York, NY: Cambridge University Press, 320337.Google Scholar
Coppock, Alexander. 2022. attrition: What the Package Does (one line, title case). R package version 0.0.0.9000.Google Scholar
Coppock, Alexander, Gerber, Alan S., Green, Donald P. and Kern, Holger L.. 2017. “Combining Double Sampling and Bounds to Address Nonignorable Missing Outcomes in Randomized Experiments.” Political Analysis 25(2): 188206.10.1017/pan.2016.6CrossRefGoogle Scholar
Dafoe, Allan, Zhang, Baobao and Caughey, Devin. 2018. “Information Equivalence in Survey Experiments.” Political Analysis 26(4): 399416.10.1017/pan.2018.9CrossRefGoogle Scholar
Deng, Yiting, Hillygus, D. Sunshine, Reiter, Jerome P., Si, Yajuan and Zheng, Siyu. 2013. “Handling Attrition in Longitudinal Studies: The Case for Refreshment Samples.” Statistical Science 28(2): 238–56.10.1214/13-STS414CrossRefGoogle Scholar
Druckman, James N., Green, Donald P., Kuklinski, James H. and Lupia, Arthur. 2011. “An Introduction to Core Concepts.” In The Cambridge Handbook of Experimental Political Science, eds. Druckman, James N., Green, Donald P., Kuklinski, James H. and Lupia, Arthur. New York, NY: Cambridge University Press, 1526.10.1017/CBO9780511921452.002CrossRefGoogle Scholar
Franco, Annie, Malhotra, Neil, Simonovits, Gabor and Zigerell, L. J.. 2017. “Developing Standards for Post-Hoc Weighting in Population-Based Survey Experiments.” Journal of Experimental Political Science 4(2): 161–72. https://www.cambridge.org/core/journals/journal-of-experimental-political-science/article/abs/developing-standards-for-posthoc-weighting-in-populationbased-survey-experiments/A01077D9A2CFB29FA02CE7963B04BB64 10.1017/XPS.2017.2CrossRefGoogle Scholar
Gerber, Alan S. and Green, Donald P.. 2000. “The Effects of Canvassing, Telephone Calls, and Direct Mail on Voter Turnout: A Field Experiment.” American Political Science Review 94(3): 653–63.10.2307/2585837CrossRefGoogle Scholar
Gerber, Alan S. and Green, Donald P.. 2012. Field Experiments: Design, Analysis, and Interpretation. WW Norton.Google Scholar
Gerber, Alan S., Arceneaux, Kevin, Boudreau, Cheryl, Dowling, Conor, Hillygus, Sunshine D., Palfrey, Thomas, Biggers, Daniel R. and Hendry, David J.. 2014. “Reporting Guidelines for Experimental Research: A Report from the Experimental Research Section Standards Committee.” Journal of Experimental Political Science 1(1): 8198.10.1017/xps.2014.11CrossRefGoogle Scholar
Gomila, Robin and Clark, Chelsey S.. 2020. “Missing Data in Experiments: Challenges and Solutions.” Psychological Methods.10.31234/osf.io/mxenvCrossRefGoogle Scholar
Göritz, Anja S. 2014. “Determinants of the Starting Rate and the Completion Rate in Online Panel Studies.” In Online Panel Research: A Data Quality Perspective, eds. Callegaro, Mario, Baker, Reg, Jelke Bethlehem, , Göritz, Anja S., Krosnick, Jon A. and Lavrakas, Paul J.. Wiley Online Library, 154–70.10.1002/9781118763520.ch7CrossRefGoogle Scholar
Green, Donald P. and Zelizer, Adam. 2017. “How Much GOTV Mail is Too Much? Results from a Large-Scale Field Experiment.” Journal of Experimental Political Science 4(2): 107–18.10.1017/XPS.2017.5CrossRefGoogle Scholar
Hansen, Morris H. and Hurwitz, William N.. 1946. “The Problem of Non-Response in Sample Surveys.” Journal of the American Statistical Association 41(236): 517–29.10.1080/01621459.1946.10501894CrossRefGoogle Scholar
Horton, John J., Rand, David G. and Zeckhauser, Richard J.. 2011. “The Online Laboratory: Conducting Experiments in a Real Labor Market.” Experimental Economics 14(3): 399425.10.1007/s10683-011-9273-9CrossRefGoogle Scholar
Jerit, Jennifer, Barabas, Jason and Clifford, Scott. 2013. “Comparing Contemporaneous Laboratory and Field Experiments on Media Effects.” Public Opinion Quarterly 77(1): 256–82.10.1093/poq/nft005CrossRefGoogle Scholar
Kaufman, Gordon M. and King, Benjamin. 1973. “A Bayesian Analysis of Nonresponse in Dichotomous Processes.” Journal of the American Statistical Association 68(343): 670–78.10.1080/01621459.1973.10481403CrossRefGoogle Scholar
Kertzer, Joshua D. 2020. “Re-assessing Elite-Public Gaps in Political Behavior.” American Journal of Political Science, Forthcoming.10.1111/ajps.12583CrossRefGoogle Scholar
Kertzer, Joshua D. and Renshon, Jonathan. 2022. “Experiments and Surveys on Political Elites.” Annual Review of Political Science 25: 529–50.10.1146/annurev-polisci-051120-013649CrossRefGoogle Scholar
Lee, David S. 2009. “Training, Wages, and Sample Selection: Estimating Sharp Bounds on Treatment Effects.” The Review of Economic Studies 76(3): 1071–102.10.1111/j.1467-937X.2009.00536.xCrossRefGoogle Scholar
Lynn, Peter. 2018. “Tackling Panel Attrition.” In The Palgrave Handbook of Survey Research. Springer, 143–53.Google Scholar
Manski, Charles F. 1995. Identification Problems in the Social Sciences. Cambridge, MA: Harvard University Press.Google Scholar
Manski, Charles F. 2009. Identification for Prediction and Decision. Cambridge, MA: Harvard University Press.10.2307/j.ctv219kxm0CrossRefGoogle Scholar
McCambridge, Jim, Eleftheria Kalaitzaki, Ian R. White, Zarnie Khadjesari, Elizabeth Murray, Stuart Linke, Thompson, Simon G., Godfrey, Christine and Wallace, Paul. 2011. “Impact of Length or Relevance of Questionnaires on Attrition in Online Trials: Randomized Controlled Trial.” Journal of Medical Internet Research 13(4): e1733. https://www.jmir.org/2011/4/e96 10.2196/jmir.1733CrossRefGoogle ScholarPubMed
McDermott, Rose. 2002. “Experimental Methods in Political Science.” Annual Review of Political Science 5(1): 3161.10.1146/annurev.polisci.5.091001.170657CrossRefGoogle Scholar
Mercer, Andrew, Lau, Arnold and Kennedy, Courtney. 2018. “1. How different weighting methods work.” https://www.pewresearch.org/methods/2018/01/26/how-different-weighting-methods-work/ Google Scholar
Morrison, Theodore C., Wahlgren, Dennis R., Hovell, Melbourne F., Zakarian, Joy, Burkham-Kreitner, Susan, Hofstetter, C. Richard, Slymen, Donald J., Keating, Kristen, Russos, Stergios and Jones, Jennifer A.. 1997. “Tracking and Follow-Up of 16,915 Adolescents: Minimizing Attrition Bias.” Controlled Clinical Trials 18(5): 383–96.10.1016/S0197-2456(97)00025-1CrossRefGoogle Scholar
Mullinix, Kevin J., Leeper, Thomas J., Druckman, James N. and Freese, Jeremy. 2015. “The Generalizability of Survey Experiments.” Journal of Experimental Political Science 2(2): 109–38.10.1017/XPS.2015.19CrossRefGoogle Scholar
Mummolo, Jonathan and Peterson, Erik. 2019. “Demand Effects in Survey Experiments: An Empirical Assessment.” American Political Science Review 113(2): 517–29.10.1017/S0003055418000837CrossRefGoogle Scholar
Musch, Jochen and Reips, Ulf-Dietrich. 2000. “A Brief History of Web Experimenting.” In Psychological Experiments on the Internet, ed. Michael, H. Birnbaum. Elsevier, 6187.Google Scholar
Mutz, Diana C. and Pemantle, Robin. 2015. “Standards for Experimental Research: Encouraging a Better Understanding of Experimental Methods.” Journal of Experimental Political Science 2(2): 192215.10.1017/XPS.2015.4CrossRefGoogle Scholar
Reips, Ulf-Dietrich. 2000. “The Web Experiment Method: Advantages, Disadvantages, and Solutions.” In Psychological Experiments on the Internet. Elsevier, 89–117.Google Scholar
Shadish, William R, Hu, Xiangen, Glaser, Renita R., Kownacki, Richard and Wong, Seok. 1998. “A Method for Exploring the Effects of Attrition in Randomized Experiments with Dichotomous Outcomes.” Psychological Methods 3(1): 3.10.1037/1082-989X.3.1.3CrossRefGoogle Scholar
Wooldridge, Jeffrey M. 2007. “Inverse Probability Weighted Estimation for General Missing Data Problems.” Journal of Econometrics 141(2): 1281–301.10.1016/j.jeconom.2007.02.002CrossRefGoogle Scholar
Woon, Jonathan. 2014. “An Experimental Study of Electoral Incentives and Institutional Choice.” Journal of Experimental Political Science 1(2): 181200.10.1017/xps.2014.19CrossRefGoogle Scholar
Zhou, Haotian and Fishbach, Ayelet. 2016. “The Pitfall of Experimenting on the Web: How Unattended Selective Attrition Leads to Surprising (Yet False) Research Conclusions.” Journal of Personality and Social Psychology 111(4): 493504.10.1037/pspa0000056CrossRefGoogle ScholarPubMed
Figure 0

Figure 1. Experimental papers in full JEPS corpus and their discussion of attrition.

Figure 1

Figure 2. Organizing schematic for assessing and handling attrition in an experimental study. Functions from attritevis that can be utilized at each query stage are in pink.

Figure 2

Figure 3. Attrition timeline visualizations: Four toy examples of attrition are presented: (a) low levels of attrition throughout the survey, with little variation across experimental arms; (b) pretreatment attrition, with little variation across arms; (c) attrition right after treatment, with differential attrition across arms; and (d) prolonged posttreatment attrition, with limited variation across arms. We assume treatment in all toy examples is assigned when respondents enter the study and delivered at Q5 (marked with a dark vertical line). The plot_attrition function in attritevis also allows plotting of attrition for all respondents (across all possible treatment groups in the study). This allows users to consider attrition pretreatment, when treatment assignment occurs mid-study. The function further permits users to plot questions by number of responses, rather than attrition, and defaults to gray scale. Users may plot by as many experimental arms as they would like and may specify plot colors.

Figure 3

Figure 4. Visualizing missingness by treatment and control group plot produced using the vis_miss_treat function; the function allows users to facet by conditions to present respondent-level visualization of missingness. Red vertical line marks treatment delivery. This figure demonstrates visualization of a toy example with immediate posttreatment attrition, where treatment caused attrition.

Supplementary material: Link

Lo et al. Dataset

Link
Supplementary material: PDF

Lo et al. supplementary material

Lo et al. supplementary material 1

Download Lo et al. supplementary material(PDF)
PDF 4.2 MB
Supplementary material: File

Lo et al. supplementary material

Lo et al. supplementary material 2

Download Lo et al. supplementary material(File)
File 16.3 KB