1 Introduction
Regression discontinuity designs (RDDs) are widely used in political science and neighboring disciplines. The popularity of RDD is not surprising given that it is often heralded as one of the few observational study designs that is successful in approximating experimental benchmarks (Green et al. Reference Green, Leong, Kern, Gerber and Larimer2009). The sharp RDD features a continuous forcing variable and a treatment assigned to units whose value of the forcing variable exceeds a known cutoff. Under the assumption that the conditional expectations of the potential outcomes are continuous in the forcing variable at the cutoff, RDD will, if correctly implemented, identify the average treatment effect at the cutoff. The success of RDD in recovering causal effects hinges on the precise approximation of the regression function above and below the cutoff, as the statistical properties of estimation and inference are closely tied to the accuracy of this approximation (Cattaneo, Idrobo, and Titiunik Reference Cattaneo, Idrobo and Titiunik2020). Point estimation of the treatment effect focuses on minimizing the mean squared error (MSE), a standard metric that evaluates the estimator’s accuracy. In contrast, statistical inference aims to ensure that the empirical coverage of confidence intervals matches their nominal levels, which can be assessed using the coverage error rate (CER).
Researchers are required to make several critical decisions regarding the implementation of the RDD estimator. These decisions include selecting a bandwidth, determining a weighting scheme for observations near and far from the threshold, choosing the polynomial order for the locally weighted least squares regression, and deciding on a method for statistical inference (see, e.g., Cattaneo et al. Reference Cattaneo, Idrobo and Titiunik2020; Lee and Lemieux Reference Lee and Lemieux2010). Approximating the conditional expectation function (CEF) of the outcomes in the vicinity of the discontinuity becomes more challenging with stronger curvature. Since different implementation methods vary in their capacity to capture this curvature with local polynomial functions, the choice of the implementation method becomes more relevant if the CEF is non-linear. The dominant approach has been to select the bandwidth by minimizing the MSE, use a rectangular or a triangular kernel to weight the sample, fit a linear polynomial on both sides of the threshold, and conduct inference using OLS approximations (Cattaneo et al. Reference Cattaneo, Idrobo and Titiunik2020; Lee and Lemieux Reference Lee and Lemieux2010). We refer to this implementation as the “conventional” local linear estimation. While this point estimator is consistent and MSE-optimal, this approach leads to biased confidence intervals due to the approximation (smoothing) error of the local polynomial estimator (Calonico, Cattaneo, and Titiunik Reference Calonico, Cattaneo and Titiunik2014; Cattaneo et al. Reference Cattaneo, Idrobo and Titiunik2020). Such inference is neither valid nor optimal in minimizing the CER. Frequently, researchers seek to correct this bias by using smaller than optimal bandwidths, combined with OLS approximation for inference (Lee and Lemieux Reference Lee and Lemieux2010). However, such ad hoc under-smoothing leads to a loss of statistical power (Calonico et al. Reference Calonico, Cattaneo and Titiunik2014; Calonico, Cattaneo, and Farrell Reference Calonico, Cattaneo and Farrell2020). The method proposed by Calonico et al. (Reference Calonico, Cattaneo and Titiunik2014) offers an alternative approach to inference by introducing bias-corrected and robust confidence intervals, circumventing the need for under-smoothing. This estimator first estimates the degree of bias using higher-order polynomials and then subtracts the estimated bias from the conventional point estimate. Robust inference is achieved by incorporating the contribution of the bias-correction step to the variability of the bias-corrected point estimator, thereby accounting for the estimation uncertainty of both the main RD estimate and the bias estimate. We refer to this implementation as the “robust” approach. Calonico et al. (Reference Calonico, Cattaneo and Titiunik2014) and Calonico et al. (Reference Calonico, Cattaneo and Farrell2020) provide theoretical results and Monte Carlo simulations that suggest that the robust approach has lower coverage error rate than the conventional approach and ad hoc under-smoothing. Furthermore, Calonico, Cattaneo, and Farrell (Reference Calonico, Cattaneo and Farrell2018) and Calonico et al. (Reference Calonico, Cattaneo and Farrell2020) propose CER-optimal bandwidths that further improve the performance of the robust bias-corrected confidence intervals.
We reviewed $68$ papers employing RDD that were published in the American Political Science Review, the American Journal of Political Science, and the Journal of Politics in 2016–2022 (see the Supplementary Material for an overview of the articles and implementations used). We found a rich tapestry of approaches to estimation and inference. The authors report local linear estimates in about nine out of ten articles, but MSE-optimal bandwidths are used in only about two-thirds of the published work. Conventional inference is most prevalent (in three out of four articles), and only $31\%$ of the publications report the robust inference of Calonico et al. (Reference Calonico, Cattaneo and Titiunik2014). Maybe surprisingly, the use of robust inference has not increased over time (see Supplementary Figure OA1). The methodological variety of the surveyed literature further underscores the need for understanding the reliability and (relative) performance of different RDD approaches. While several studies examine the properties of different estimators and RDD implementations theoretically and through Monte Carlo simulations (Calonico et al. Reference Calonico, Cattaneo and Titiunik2014, Reference Calonico, Cattaneo and Farrell2018, Reference Calonico, Cattaneo and Farrell2020), there has been little effort in evaluating their performance against experimental benchmarks.Footnote 1 Our paper takes steps towards filling this evidence gap by documenting which RDD implementation can best replicate the gold standard of randomized experiments.Footnote 2
Our validation analysis focuses on the electoral context, where numerous RDD applications have used close elections to estimate the effects of holding office on various outcomes, including incumbency advantage (for a review, see De la Cuesta and Imai Reference De la Cuesta and Imai2016). Building on Hyytinen et al. (Reference Hyytinen, Meriläinen, Saarimaa, Toivanen and Tukiainen2018), we leverage electoral ties that are resolved by a lottery to estimate the effect of being the incumbent (versus being the runner-up) on re-running and getting elected in the next election in Colombia and Finland. Because candidates in tied elections have precisely the same number of votes, the average treatment effect estimated from the lottery sample is a local estimate at the cutoff that determines whether or not a candidate gets elected. This implies that in addition to focusing on the same institutional context and population, the lottery and RDD also target the same estimand. This makes lotteries an ideal benchmark to evaluate the performance of the RDD estimator. Both countries (mostly) use open-list proportional representation systems and provide us with a large number of observations to work with, even when we focus on close elections, making statistical power less of an issue (cf. Stommes et al. Reference Stommes, Aronow and Sävje2023). We extend on Hyytinen et al. (Reference Hyytinen, Meriläinen, Saarimaa, Toivanen and Tukiainen2018) by analyzing more countries and outcomes, which allows us to illuminate the role of the curvature of the outcome’s CEF for the performance of various implementation approaches, and compare the results across different institutional contexts. Moreover, we include in our analysis the recent approach proposed by Calonico et al. (Reference Calonico, Cattaneo and Farrell2020) who use for inference a bandwidth optimized for the CER (which is achieved by re-scaling the MSE-optimal bandwidth by a shrinkage factor proportional to the sample size)Footnote 3 .
2 Data
Our main analysis examines local government elections in Colombia (2003—2015) and Finland (1996—2012).Footnote 4 Finland features a pure open-list electoral system where each voter gives exactly one vote to one candidate. Parties are assigned seats based on the sum of its candidates’ personal votes, and the seats within the party are assigned purely on the basis of personal votes. Moreover, candidates are almost always presented in alphabetical order in the ballot lists. Council size depends on the municipal population and varies between 13 and 85. Councils are the main political decision-maker and are responsible for key public services such as education and healthcare. In Colombia, parties can choose between open or closed lists. However, in the 2015 local elections, about 92% of parties opted for open lists (Hangartner, Ruiz, and Tukiainen Reference Hangartner, Ruiz and Tukiainen2019), which are the focus of our analysis. Voters can still decide to vote just for the party, but personal votes determine the within-party allocation of seats. Council size varies between 7 and 45 and is determined by the number of registered voters. The main role of the council is to approve the budget and projects proposed by the mayor. In both countries, a sizable number of parties compete in local elections.
For the RDD analysis, we leverage party lists that nominate at least two candidates and elect at least one and fewer than all listed candidates. The resulting data consist of $147,558$ candidate-election year observations for Colombia and $154,543$ for Finland. The data reveal a substantial number of tied elections: $463$ and $1,351$ for Colombia and Finland, respectively. These samples are sufficiently large to provide reliable experimental benchmarks for comparing the RDD estimates. In the Supplementary Material, we show that there is no evidence of manipulation of the lottery outcomes.
3 Comparing Lottery and RDD Estimates
We first focus on the sample of tied candidates and regress, using OLS, an indicator variable for running or getting elected in the next election ( $t+1$ )—the two outcomes—on a binary indicator for getting elected in the current election (t)—the treatment. We do not condition the analysis of getting elected in $t+1$ on running in $t+1$ because this decision might be endogenous to getting elected in t.Footnote 5 We cluster our inference (and later optimal bandwidth selection) at the local government level.
Panel A of Table 1 shows the experimental estimates for running in the next election. Column (1) reports the effect in Colombia, and column (6) reports the effect in Finland. We find that in Colombia, getting elected boosts the probability of re-running by about $14$ percentage points ( $p=0.002$ ). In the Finnish case, the point estimate is close to zero, about $0.011$ , and not statistically significant ( $p=0.671$ ). Columns (1) and (6) in Panel B of Table 1 show the estimation results for getting elected in the next election. The estimates for both Colombia and Finland are close to zero in magnitude, $-0.030$ and $0.004$ , and not statistically significant ( $p=0.371$ and $p=0.860$ , respectively). Thus, there is little evidence that being the winner in election t (versus being the runner-up) increases the probability of getting elected in $t+1$ . Moreover, with $95\%$ confidence intervals of $[-0.097,0.037]$ for Colombia and $[-0.044,0.053]$ for Finland, we can rule out all but relatively small incumbency effects.
The dependent variable equals one if a candidate re-runs or gets elected in the next election and zero otherwise, in Panels A and B, respectively. Estimates in columns (1) and (6) are based on the election lottery samples. Columns (2)–(5) and (7)–(10) present results from different RDD specifications. “Conventional” refers to local linear estimation and OLS for inference. “Robust” refers to robust and biased-corrected inference and uses the main bandwidth for the bias-correction. All RDD estimations use a rectangular kernel. The $95 \%$ confidence intervals are based on standard errors clustered by municipality and reported in brackets. We also account for clustering when computing the optimal bandwidths. The number of observations refers to the effective sample size used for the estimation. The total number of observations is $147,558$ for Colombia and $154,543$ for Finland.
We next turn to the RDD analysis. We construct the running variable from the winning margin for candidates on the same party list. For elected candidates, this equals their within-party vote share minus the within-party vote share of the first non-elected candidate. For the non-elected, this equals their within-party vote share minus the within-party vote share of the last elected candidate. This allows a comparison of candidates who barely won a seat to those who ran on the same list but barely lost. Columns (2)–(5) in Table 1 report the RDD estimates for Colombia, and columns (7)–(10) show the corresponding estimates for Finland. We provide eight specifications: conventional and robust approaches to inference, alternating between local linear and local quadratic polynomials, and using either MSE- or CER-optimal bandwidths. We use the same main and bias bandwidth for the robust bias-corrected estimation (Calonico et al. Reference Calonico, Cattaneo and Titiunik2014, Reference Calonico, Cattaneo and Farrell2020). This means we effectively fit a polynomial of order $p+1$ within the bandwidth optimized for polynomial order p. The first implication arising from Panel A of Table 1 is that the lottery estimates for re-running are broadly in line with the RDD estimates for both countries. Although the lottery estimate is slightly smaller than the RDD estimates in the case of Colombia, these differences are not statistically significant. In the Finnish data, all differences between lottery and RDD estimates for re-running are miniscule.
When focusing on the incumbency advantage displayed in Panel B, we find larger deviations between the lottery and conventional RDD estimates. In both countries, we would draw qualitatively different conclusions regarding the incumbency effect estimated using the lotteries vis-à-vis RDD. While the lottery estimates provide little support for an incumbency advantage, the RDD estimates would imply a small positive and significant effect of getting elected in t on getting elected in $t+1$ . However, when employing the robust RDD approach of Calonico et al. (Reference Calonico, Cattaneo and Titiunik2014), these discrepancies become more muted, especially when considering the uncertainty of the lottery and RDD estimates. Moreover, the CER optimal bandwidths yield confidence intervals that are closer to the experimental benchmark compared to the MSE-optimal bandwidths, even with the robust bias-corrected estimator. Lastly, we see that the CER optimal bandwidth used in combination with the conventional estimator does not solve the coverage issue as the bandwidth choice is optimized for the bias-corrected estimator.
The left graphs in Figure 1 visualize the RDDs. The plots show binned averages and local linear and quadratic fits within different (optimal) bandwidths. The right graphs plot the corresponding lottery estimates and RDD estimates using a range of bandwidths and local linear and local quadratic polynomials. Panel A shows the RDD for running in $t+1$ and Panel B for getting elected in $t+1$ .
Panel A shows a positive RDD estimate of getting elected in t on the likelihood of running in $t+1$ in Colombia, with the size of the jump similar to the lottery estimate. For Finland, the RDD estimate is close to zero, aligning with the experimental benchmark. In Panel B, the RDD estimates for the propensity to get elected in $t+1$ are positive for both Colombia and Finland, contrasting with the null findings from the lotteries.Footnote 6 However, in Finland, the graph suggests that the fitted polynomial models may inadequately capture the curvature of the CEF near the cutoff.Footnote 7 The right panel of Figure 1 demonstrates that discrepancies between the lottery estimates and the RDD graph can be mitigated by adjusting the approach to statistical inference; lower-order polynomials perform better in capturing curvature within narrower bandwidths, as seen in Panel B. Together, these results suggest an important finding: if the CEF is (approximately) linear close to the cut-off as in Panel A, both “conventional” and “robust” approaches can recover the experimental benchmark. If, however, the CEF is non-linear close to the cut-off as in Panel B, then the “robust” approach with CER-optimal bandwidths outperforms other implementations. In the next section, we discuss how this pattern extends to other data.
4 Discussion
Despite the popularity of RDD for drawing causal inferences from observational data, there is a paucity of research that evaluates if and when different RDD estimators are able to recover experimental estimates. Leveraging tied elections resolved by a lottery in Colombia and Finland as experimental benchmarks, we find that the type of RDD implementation makes little difference when the CEF around the cutoff is approximately linear. However, with curvature, the robust approach to inference proposed by Calonico et al. (Reference Calonico, Cattaneo and Titiunik2014) performs better than conventional local linear estimation. The linearity or non-linearity of the CEF close to the cutoff inform the choice of the bandwidth and in our applications, the CER methods suggests smaller bandwidths. This feature may explain why CER methods outperform MSE-optimal methods in this setting.
To understand whether the upward bias in the incumbency advantage documented for the conventional RDD estimate could be a symptom of a more widespread pattern, we extend our analysis to two neighboring countries with similar open-list PR systems: Brazil and Denmark. The Supplementary Material discusses the data and results in detail. In Brazil, we find relatively small differences between the different implementations when looking at running at $t+1$ as the dependent variable. In contrast, we again find larger estimates for the “conventional” compared to the “robust” approach in Denmark. Furthermore, the personal incumbency advantage estimates are smaller for both countries when we use the robust approach rather than conventional local linear estimation. A graphical analysis suggests that these differences are—again—partly due to the presence of curvature near the cutoff.
Our findings have both substantive and methodological implications. Substantively, our results suggest that the personal incumbency effect varies considerably across countries with similar electoral systems. Alternative explanations such as the weakness of the party system (Klašnja and Titiunik Reference Klašnja and Titiunik2017) or the level of development and corruption (Klašnja Reference Klašnja2015) are insufficient to explain the differences in incumbency advantage we observe across the four studied countries. Future research should explore other factors, such as differences in career objectives among politicians (De Magalhães and Hirvonen Reference De Magalhães and Hirvonen2023), that may help explain the variation in incumbency advantage across these countries.
Methodologically, our study highlights the sensitivity of RDD estimates to specific implementation choices. The robust bias-corrected approach of Calonico et al. (Reference Calonico, Cattaneo and Titiunik2014) coupled with CER-optimal bandwidths proposed in Calonico et al. (Reference Calonico, Cattaneo and Farrell2020) appears to (weakly) dominate other approaches, which is something we recommend practitioners keep in mind when using RDD. These implementation choices become more important when the curvature of the outcome’s CEF close to the cutoff is not linear. We conclude by encouraging further research to better understand how the curvature of the outcome’s CEF near the cutoff interacts with various RDD implementation choices, as this interaction plays a crucial role in the reliability of the RDD estimates.
Acknowledgements
We thank the editor, five anonymous referees, and conference participants at EPSA 2018 and the Econometric Society World Congress 2020 for their useful comments. We also thank Valtteri Kimmo for competent research assistance. This research is funded by the European Union (ERC, INTRAPOL, 101045239).
Funding
Views and opinions expressed are only those of the authors, however, and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them.
Data Availability Statement
Replication code and data for this article are available online at Political Analysis Harvard Dataverse in De Magalhes et al. (Reference De Magalhes, Hangartner, Hirvonen, Meriläinen, Ruiz and Tukiainen2024), https://doi.org/10.7910/DVN/XDVIBG.
Supplementary Material
For supplementary material accompanying this paper, please visit https://doi.org/10.1017/pan.2024.28.