It is well known that in modern democracies, including the United States, election forecasting faces significant challenges. The increasing trend of survey refusals (Plewes and Tourangeau Reference Plewes and Tourangeau2013), particularly among segments of the population that are more likely to support specific parties or candidates (Kennedy et al. Reference Kennedy, Blumenthal, Clement, Clinton, Durand, Franklin and McGeeney2018), exacerbates these difficulties. Additionally, factors such as the spiral of silence and cross-cutting pressures can prevent some respondents from accurately disclosing their true political preferences. This often results in biased self-reported data (Blair, Coppock, and Moor Reference Blair, Coppock and Moor2020) and skewed election forecasts.
However, these issues are not the only hurdles faced by electoral researchers and analysts. There also are significant concerns about the effectiveness of the methods used to capture trends and infer the likely aggregate outcomes of electoral processes. Most studies rely primarily on inductive aggregation procedures—weighted or unweighted—of individual self-reported voting intentions or on theory-based models that examine structural relationships between macro-level variables, such as economic conditions and electoral outcomes (for an overview, see Lewis-Beck and Dassonneville Reference Lewis-Beck and Dassonneville2015). Although both approaches have contributed significantly to the field of election forecasting, they often overlook the substantive mechanisms underlying the outcomes that they aim to predict—namely, voter-level decision-making dynamics.
Building on this premise, this article contends that citizens’ reasoning is a crucial lens for understanding decision-making processes and predicting their aggregate effects. It demonstrates that voter-based regression models can enhance election predictions while complementing existing aggregation and structural strategies. Notably, this analysis draws on high-quality data from the American National Election Studies (ANES), an approach that is used rarely in forecasting exercises. Using pre-electoral datasets from the past three presidential elections (i.e., 2012, 2016, and 2020), I show that predictions of support for Republican candidates—who often are underrepresented in US election polls—are equally or even more accurate than those generated by other methods. Working with high-quality individual-level data to tackle prediction issues has not been used often for forecasting purposes and represents a valuable step forward in addressing prediction challenges.
...this article contends that citizens’ reasoning is a crucial lens for understanding decision-making processes and predicting their aggregate effects. It demonstrates that voter-based regression models can enhance election predictions while complementing existing aggregation and structural strategies.
BACKGROUND: LEVERAGING ANES DATA TO PREDICT POPULAR-VOTE SHARES IN US PRESIDENTIAL ELECTIONS
Election forecasting in established democracies, including the United States, is a practice with a long history. Lewis-Beck and Dassonneville (Reference Lewis-Beck and Dassonneville2015) provided a comprehensive overview of the evolution of various prediction approaches. They distinguished between (1) a theory-based approach known as structural modeling, which predicts election outcomes using multivariate equations that account for a range of macro-level factors (Lewis-Beck and Stegmaier Reference Lewis-Beck and Stegmaier2013); and (2) a more inductive approach known as aggregation, in which analysts estimate vote shares by applying combination rules to multiple election polls (Traugott Reference Traugott2014). As a hybrid of these two methods, “synthesizers” proposed model-based predictions that also incorporate polls as additional data (Mongrain, Nadeau, and Jérôme Reference Mongrain, Nadeau and Jérôme2021).
It is important to note that all of these methods share a common characteristic: their units of analysis and, therefore, the level at which they perform inferences tend to be at the state or national level. In the case of structuralist approaches, this strategy is used deliberately to avoid relying on surveys, which can introduce estimation errors into the predictions. However, macro-level approaches are consistently affected by the issue of ecological fallacy because they attempt to predict outcomes of a process that inherently involves a strong micro-level component (i.e., voters’ decision making) using only macro-level factors (e.g., the state of the economy) (Kramer Reference Kramer1983). This limitation highlights the analytical advantages of survey approaches, which have made significant progress in recent years by addressing issues such as nonresponse bias and systematic measurement error through various means, including controlling for reported past voting behavior and sociodemographics. Conversely, mounting polarization combined with spiral-of-silence mechanisms—in which citizens may be reluctant to express their pre-electoral preferences due to fear of social isolation for holding minority opinions in their environment—makes this challenge particularly difficult to address. Based on an analysis of survey data and county-level election results from the prior presidential election, Camatarri, Luartz, and Gallina (Reference Camatarri, Luartz and Gallina2023) provide clear evidence of the spiral-of-silence effect among Republican voters in 2020. This aligns with findings from extant research (Dinas, Martínez, and Valentim Reference Dinas, Martínez and Valentim2024; Urquizo-Sancho Reference Urquizu-Sancho2006), which indicates that right-wing respondents are less likely to disclose their political preferences in environments that are perceived as hostile to their views. In contrast, Democrats do not exhibit the same pattern. This difference likely is attributable to their greater psychological openness—a trait that enables them to express their views more freely and engage in political discussions even in contexts in which their opinions may be in the minority (Gerber et al. Reference Gerber, Huber, Doherty, Dowling and Ha2010; Mutz Reference Mutz2002). However, it is important to acknowledge that not all Republican supporters share the same approach to expressing their preferences. Whereas some may fear social isolation and underreport their support, more nonconformist Republicans may feel less constrained by holding minority opinions within their social circles (Kushin, Yamamoto, and Dalisay Reference Kushin, Yamamoto and Dalisay2019). This diversity among Republican supporters likely contributes to the fact that despite noticeable polling misses, the discrepancies between election forecasts and actual outcomes generally are not massive.
Moreover, it also is important to note that a potential cluster of Republican supporters—such as those with lower educational levels, anti-elite views, relatively weak partisan affiliations, and political disaffection—generally are less likely to participate in surveys. This broader survey nonresponse bias, in addition to the sensitivity of the voting intention question especially for Trump supporters, can increase prediction errors in estimating Trump’s aggregate support (Kennedy et al. Reference Kennedy, Blumenthal, Clement, Clinton, Durand, Franklin and McGeeney2018).
To address these challenges and improve the quality of survey estimates, a comprehensive approach is essential. First, using individual-level rather than aggregate-level data enables more accurate inferences about the micro-level factors that influence election outcomes. Second, applying post-estimation weighting helps to correct for selection bias and ensures more representative estimates across different population segments.
DATA AND ESTIMATION PROCEDURE
Against this background, I propose a straightforward approach to predicting Republican candidate popular support. This method addresses the need to leverage individual-level data to avoid ecological fallacies, and it adjusts for selection bias through weighting. The process begins with estimating an individual-level vote function grounded in electoral behavior theory. The predicted values from this model then are used to infer individual support for Trump (i.e., predicted votes), which subsequently are aggregated to estimate the candidate’s overall vote share.Footnote 1
ANES data are particularly well suited for this approach because longitudinal analyses (1952–2020) have demonstrated consistently that its voting intention measures reliably reflect aggregate popular vote outcomes (Ko et al. Reference Ko, Jackson, Osborn and Lewis-Beck2024). For this analysis, I used the pre-electoral waves of the 2012, 2016, and 2020 ANES Time Series study, which included 5,914, 4,270, and 8,280 overall respondents, respectively. The interviews were assigned randomly to different modes, including in-person and online in 2012 and 2016 and telephone and video calls in 2020. It is important to note that in addition to capturing voting intentions, the surveys collect extensive information on respondents’ sociodemographic and attitudinal backgrounds. These comprehensive data allow for the construction of a robust vote function that accounts for the main factors influencing individual voting behavior.
Starting from a retrospective framework, the model incorporates variables including (dis)approval of presidential economic performance during the past year and respondents’ feelings about their financial situation compared to the previous year. It also includes general (dis)trust toward the federal government. From a positional perspective, the model considers respondents’ self-placement on a seven-point ideological scale ranging from liberal to conservative. Finally, the estimations also account for several sociodemographic controls, including gender, age, highest education level, and ethnic background. This latter control is simplified dichotomously for comparison as follows: white versus all categories including Black, Hispanic, Asian/Hawaiian, Native American/Alaska Native, and others such as multiple non-Hispanic backgrounds.Footnote 2 It is important to note that the model incorporates state-level fixed effects and normalizes all variables between 0 and 1 to ensure comparability of the coefficient sizes.
The models used were logistic regressions predicting voting intentions for the Republican candidate versus all other presidential candidates in 2012, 2016, and 2020. A key advantage of logistic regression in this context is its simplicity and interpretability, as well as its suitability for predicting binary outcomes such as the one of interest in this analysis. Odds ratios can be interpreted directly as changes in probability associated with a one-unit change in the predictor variables, making the model’s output relatively intuitive.
To further validate the results and address the issue of statistical uncertainty, this study used Bayesian logistic regression models with Markov Chain Monte Carlo (MCMC) methods.Footnote 3 By complementing mainstream logistic regression models with Bayesian estimations, the study enhanced the robustness of the findings regarding factors that influenced the binary outcome of interest and the resulting predictions.Footnote 4
The predicted probabilities from both methods categorized respondents as either Republican or non-Republican supporters, alternating the standard 0.5 probability threshold and the weighted mean probability as cutoff points. The predicted votes for the Republican candidate versus other candidates then were aggregated (and weighted) to derive a federal-level estimate of vote shares in each scenario. This reflected the overall popular vote as closely as possible to the reference population. For both steps, the full sample preelection weight of each survey was applied.
RESULTS
As previously discussed, the initial step of the analysis involved establishing a baseline individual-level model to generate aggregate predictions of electoral support. The results of the mainstream logistic estimations are presented in online appendix table A1. As shown in the table, most predictors had a significant effect on voting intentions for the Republican candidate, thereby aligning with expectations based on existing theories. For example, holding a more conservative position on the ideological scale consistently correlated with a significant increase in the probability of intending to vote for the Republican candidate. Conversely, economic disapproval was identified as one of the strongest factors in predicting Republican support, particularly when the presidential candidate was not an incumbent. In the case of Trump’s incumbency, this coefficient reflected a significantly negative effect. A similar trend—although less pronounced—was observed regarding perceptions of a worsening personal economic situation. Regarding sociodemographic factors, it is noteworthy that higher education levels were negatively associated with support for the Republican candidate only during Trump’s previous presidential runs (i.e., 2016 and 2020). Conversely, a white-ethnic background consistently emerged as a strong predictor of Republican support across all data points.Footnote 5 It is important to note that the predictors exhibited highly comparable effects in the Bayesian analysis as well, as indicated by their posterior means for the outcome variable (see online appendix table A2).
The next step was converting the predicted probabilities from each estimation into aggregate estimates of support for the presidential candidate, focusing specifically on popular vote shares.Footnote 6 To achieve this, both the default 0.5 threshold and the average predicted probability—weighted according to the full sample pre-electoral weight—were used as cutoffs for predicting whether respondents would vote for the Republican candidate.Footnote 7 The results, alongside estimates derived from simple aggregation of voting intentions (both weighted and unweighted) on the full sample, are presented in Table 1. The table also includes the actual percentage of popular support for each Republican candidate based on election results as certified by the Federal Electoral Commission (2012; 2016; 2020), thereby facilitating a comparison and an assessment of the different strategies.
Table 1 Republican Presidential Candidate Support: Estimates vs. Actual Results (%)

Note: Parentheses indicate 95% confidence intervals.
First, it should be noted that the estimate derived from straightforward aggregation of actual voting intentions in the data significantly underestimated support for the Republican presidential candidate throughout the entire period. This underscored the pressing challenge of mitigating selection bias in electoral surveys, even with high-quality data such as from the ANES. Weighting the estimates using the full sample weight slightly improved accuracy; however, the confidence intervals remained significantly below the actual electoral support received by each Republican candidate in the corresponding elections.
Second, for the logistic-based approaches (both standard and Bayesian), a clear improvement in the estimates of each candidate was evident, particularly when the average predicted probability was used as the cutoff. This method successfully identified confidence intervals compatible with actual results for the Republican candidates. Notably, in 10 of 12 overall models, the forecasting error (i.e., the absolute difference between the predicted estimate and the actual result) was well below 3%. This performance was significantly better than the mean absolute error of major commercial polls and the entire 1952–2020 ANES Time Series, which averages approximately 3% for predicting the incumbent candidate’s vote share (Ko et al. Reference Ko, Jackson, Osborn and Lewis-Beck2024). Moreover, this result aligns with alternative methods tested for the same election (Erikson and Wlezien Reference Erikson and Wlezien2021), thereby further supporting the efficacy of this approach.
Notably, in 10 of 12 overall models, the forecasting error (i.e., the absolute difference between the predicted estimate and the actual result) was well below 3%.
CONCLUSIONS
This study addresses the unique challenges of forecasting electoral support in the current US political climate, marked by increasing nonresponse and selection bias, which compromise the quality of estimates (Enns, Lagodny, and Schuldt Reference Enns, Lagodny and Schuldt2017). To address these challenges, I demonstrated that a model-based estimation approach relying on individual-level data and theoretically relevant variables can yield results closely aligned with actual election outcomes. The rationale for using a model-based approach lies in its analytical appropriateness because it targets the decisive levels at which electoral decisions are influenced (i.e., individual voters). This provides an important layer of validation to studies that rely solely on aggregation or macro-level modeling. Overall, standard logistic approaches provided highly accurate predictions across all three elections, with minimal forecasting errors (e.g., as low as 0.18% in 2016). In contrast, unweighted and weighted simple aggregation methods consistently produced higher forecasting errors, underestimating support for Republican candidates. The Bayesian logistic models followed patterns similar to the standard logistic models, although they occasionally exhibited slightly higher forecasting errors.
Overall, these results highlight that regression-based voter-level forecasting models have the potential to provide strong predictive performance in estimating popular vote shares. In doing so, they position the ANES 2020 Time Series alongside other forecasting approaches that maintain an absolute error of less than 3 percentage points (Graefe Reference Graefe2021).
Overall, these results highlight that regression-based voter-level forecasting models have the potential to provide strong predictive performance in estimating popular-vote shares.
Despite these encouraging results, several important limitations warrant consideration. First, the analysis focuses primarily on predicting popular support. Although this emphasis is vital for understanding general opinion trends and their macro-consequences, it overlooks critical components of the electoral system—particularly the role of the Electoral College in US presidential elections. This omission is significant and suggests that future efforts should expand the methodology to encompass both federal and state levels, enabling a more nuanced prediction not only of election outcomes but also of actual winners. Second, future research should explore alternative model specifications and evaluate the potential benefits of incorporating election-specific predictors, including for the forthcoming 2024 elections, beyond the standard model used in this study. To maximize comparability, these estimates did not account for factors such as the impact of the COVID-19 pandemic and immigration attitudes. The former proved central to voters’ decisions in 2020 (Luartz, Camatarri, and Gallina Reference Luartz, Camatarri and Gallina2024), whereas border security and immigration were pivotal themes in Trump’s successful 2016 campaign. Reports indicated that these issues persisted—at least in terms of candidate rhetoric—as the 2024 election approached (Tourangbam Reference Tourangbam2024). Additionally, the significance of women’s rights and civil rights has become increasingly pronounced, particularly following the overturning of Roe v. Wade during the first Trump administration. In contrast, the 2012 election was characterized by a greater emphasis on economic and healthcare issues, reflecting the electorate’s concerns during recovery from the Great Recession (Kiousis et al. Reference Kiousis, Kim, Ragas, Wheat, Kochhar, Svensson and Miles2015).
Looking ahead to the 2024 election, a combination of domestic and foreign concerns is likely to have a crucial role in shaping voter decisions. In addition to the enduring importance of economic concerns, issues including women’s rights, immigration security, the war in Ukraine, and climate change emerged as significant focal points in the 2024 presidential debates, indicating potential shifts in voter priorities. A systematic approach to incorporating the salient issues for each election cycle will be essential in informing future models, striking a balance between maintaining comparability and adequately capturing the complexities of voters’ decision-making processes.
SUPPLEMENTARY MATERIAL
To view supplementary material for this article, please visit http://doi.org/10.1017/S1049096524000933.
ACKNOWLEDGMENTS
I sincerely thank the editors of the Special Issue on Forecasting the 2024 Presidential Elections, Philippe Mongrain and Mary Stegmaier, for the opportunity to contribute this article, as well as for their valuable and constructive feedback on the draft version. I also extend my gratitude to the two anonymous reviewers for their thoughtful comments and engagement, which helped me to strengthen this article both conceptually and analytically.
DATA AVAILABILITY STATEMENT
Research documentation and data that support the findings of this study are openly available at the PS: Political Science & Politics Harvard Dataverse at https://doi.org/10.7910/DVN/RTTI71.
CONFLICTS OF INTEREST
The author declares that there are no ethical issues or conflicts of interest in this research.