1 Introduction
Looking at election eve forecasts for the three high-profile elections in the UK (Brexit) and US (Trump) in 2016, and France (Le Pen) in 2017, FiveThirtyEight’s Nate Silver observed that, in each case, polling errors occurred in the opposite direction of what experts or betting markets had expected. This led him to conclude that: “When the conventional wisdom tries to outguess the polls, it almost always guesses in the wrong direction” (Reference SilverSilver, 2017). It is difficult to find evidence for or against this claim. Although the use of expert judgment in forecasting elections goes back long before the emergence of scientific polling (Reference KernellKernell, 2000), we know surprisingly little about the relative accuracy of experts and polls.
Research on expert forecasting in different fields shows that the value of expertise is indeed limited when forecasting complex problems. In such situations, expert forecasts are little more – and sometimes even less – accurate than those from novices and naïve statistical models, such as a random walk (Reference ArmstrongArmstrong, 1980; Reference TetlockTetlock, 2005). One can however expect experts to make useful predictions for problems for which they get good feedback about the accuracy of their forecasts, and if they know the situation well (Reference Green, Graefe, Armstrong and LovricGreen, Graefe & Armstrong, 2011).
Election forecasting appears to meet these conditions. First, elections have clear outcomes upon which forecast accuracy can be judged. Such feedback can help forecasters to learn about judgment errors and biases. Second, political experts can draw on a vast amount of theory and empirical evidence about electoral behavior, particularly for U.S. presidential elections, which should help them read and interpret polls. For example, research has shown that polls tend to tighten (Reference Erikson and WlezienErikson & Wlezien, 2012), and the shares of both third-party support and undecideds decrease, as the election nears (Reference RikerRiker, 1982). We also know that certain campaign events such as party conventions (Reference Campbell, Cherry and WinkCampbell, Cherry & Wink, 1992) and candidate debates (Reference Benoit, Hansen and VerserBenoit, Hansen & Verser, 2003) can yield predictable shifts in the candidates’ polling numbers, not necessarily by affecting people’s vote preference but rather their willingness to participate in a poll (Reference Gelman, Goel, Rivers and RothschildGelman, Goel, Rivers & Rothschild, 2016). Furthermore, structural factors, the so-called ‘fundamentals’ (e.g., the state of the economy or the time the incumbent party has been in the White House), quite accurately predict election outcomes, even months in advance. In sum, when forecasting elections, experts receive immediate and accurate feedback about their forecasts and have access to domain knowledge, some of which may not be accounted for in the polls.
Polls, on the other hand, are far from being perfect predictors of election outcomes themselves, and are subject to various types of error (Reference BiemerBiemer, 2010; Reference Groves and LybergGroves & Lyberg, 2010). Prior research found that the empirical error of polls is about twice as large as the estimated sampling error (Reference BuchananBuchanan, 1986; Reference Shirani-Mehr, Rothschild, Goel and GelmanShirani-Mehr, Rothschild, Goel & Gelman, 2018). Furthermore, polls were found to be among the least accurate methods available to forecast U.S. presidential elections, especially if conducted weeks or even months before an election (Reference Graefe, Armstrong, Jones, Cuzán, Cavari, Powell and MayerGraefe, Armstrong, Jones & Cuzán, 2017).
It thus seems reasonable to expect that experts are able to tell the direction in which the polls err. The present study provides empirical evidence to answer this question by analyzing the relative accuracy of expert judgment, a simple polling average, and fundamentals-based forecasts for predicting the popular vote in the four U.S. presidential elections from 2004 to 2016.
2 Materials and methods
2.1 Forecast data
Expert forecasts.
Expert forecasts for the four U.S. presidential elections from 2004 to 2016 were collected over a 12-year period within the PollyVote.com research project. Since 2004, the PollyVote team has periodically asked experts to predict the national vote in U.S. presidential elections, starting many months before the election. The total number of surveys conducted across the four elections is 36 (Appendix I). The survey results were published prior to each election at pollyvote.com.
The expert panel consisted of American political scientists and, in 2004 and 2008, some practitioners. The panel composition changed across elections, with the number of panelists varying from 15 to 17. Some experts participated in only one election, others participated in all four elections. The total number of experts who participated in at least one survey round was 36, with the number of available forecasts per individual expert ranging from 1 to 36. The average number of experts for a single survey round was 13, and ranged from 8 to 17 (Appendix II).
From 2004 to 2012, the experts predicted the two-party vote for the candidate of the incumbent party (“Considering only the major party candidates, what share of the 2-party vote do you expect the nominee of the incumbent [Democratic/Republican] party to receive in the [YEAR] general election?”). In 2016, the experts predicted the vote for each party, including third-parties and others (“What share of the national vote (in %) do you expect the nominees to receive in the 2016 presidential election?”).Footnote 1
Polling average.
The RealClearPolitics poll average (RCP) was used as the benchmark for the performance of polls. The RCP is an established and widely-known polling average, and one of the few that was active as early as 2004. The RCP average does not weight by sample size or recency, and it does not correct for house effects (partisan lean of pollsters). The RCP is thus a very raw representation of publicly available opinion polls.
Fundamentals-based forecast.
For nearly four decades, political scientists and economists have developed quantitative models for forecasting U.S. presidential elections. Most of these models are based on the theory of retrospective voting, which assumes that voters reward or punish the incumbent (party) based on its performance. Thereby, different models measure performance in different ways. While most models include at least one measure of economic performance, some models include military fatalities (e.g., Reference HibbsHibbs, 2012), the incumbent president’s job approval rating (e.g., Reference AbramowitzAbramowitz, 2016), or the candidates’ performance in primary elections (e.g., Reference NorpothNorpoth, 2016). In addition, several models measure the time that the incumbent party (or president) has been in office to account for the electorate’s periodic desire for change (e.g., Abramowitz (2009), Reference CuzánCuzán (2012), Reference FairFair (2009)).Footnote 2
For the present study, I created a fundamentals-based forecast by calculating rolling averages of forecasts from five established models (equally weighted) that were published prior to each of the four elections from 2004 to 2016.Footnote 3 The five models were those by Reference AbramowitzAbramowitz (2016), Reference CuzánCuzán (2012), Reference FairFair (2009), Reference HibbsHibbs (2012) and Reference NorpothNorpoth (2016). I deliberately decided to select these models and combine their forecasts for several reasons. First and foremost, these models are ‘pure’ in that they rely only on fundamental data. That is, they ignore trial-heat polls that measure support for the party nominees.Footnote 4 Hence, these models provide a base rate prediction (or reference class forecast) of what one would expect to happen under ‘normal’ circumstances (i.e., with generic candidates). Second, ex ante forecasts published prior to each election were available for all five models. Third, each model uses different variables and thus includes different information. A combined forecast based on those models thus captures more information than any single model and minimizes the danger that the results are due to cherry-picking a single model.
2.2 Comparison of forecasts
The individual expert forecasts were compared to the respective forecasts from polls and fundamentals from the day an expert forecast was made. If the exact date of the expert forecast was unavailable, the last day of the expert survey was used as a reference. For the two expert surveys conducted in May and July of 2011, more than a year before the 2012 election, no RCP data were available. These surveys were thus excluded from the analysis.
2.3 Forecast combination
I also formed two combined measures. A combined forecast of polls and experts was calculated as the equally-weighted average of an individual expert forecast and the polling average that day. A combined forecast of polls, experts, and fundamentals was calculated as the equally weighted average of an individual expert forecast, the polling average that day, and the fundamentals-based forecast.
2.4 Error measure
Forecast errors were calculated as the difference between the predicted and actual Democratic vote share. The analysis is based on the two-party vote. Where necessary (as for the RCP and the 2016 expert forecasts), two party vote shares were calculated using the following formula: (Democratic vote)/(Democratic vote + Republican vote).
3 Results
3.1 Directional error
If experts are able to identify the directional error of polls, we would expect their own forecasts to be in the direction of the actual election outcome. Figure 1 suggests that this was in fact the case. When comparing experts’ forecasts with the polling average of the same day across the four elections from 2004 to 2016, 277 (62%) of 450 expert forecasts were in the direction of the actual election outcome.Footnote 5
However, the simple fact that the majority of expert forecasts pointed in the right direction does not imply that these forecasts are necessarily more accurate than the polls. The reason is that experts may adjust the polling numbers too far in the right direction and overshoot the actual outcome. This would result in an error larger than that of the polls. This happened for 81 (18%) of the expert forecasts. Together with the 173 (38%) cases in which experts moved the forecast in the wrong direction, the majority (56%) of expert forecasts were thus in fact less accurate (farther from the actual outcome) than the polls.
3.2 Vote share forecast error
Expert inaccuracy is also apparent from comparing the errors of experts’ vote share forecasts with those from polls. Figure 2 shows the results as the mean absolute error (MAE) for each election, and across the four elections. The results were mixed. In 2008, experts outperformed the polls, whereas in 2004 the polls were more accurate than the experts. In 2012 and 2016, differences in accuracy were small. Across the four elections, the weighted (by the number of available forecasts in each election) MAE of a typical expert forecast (1.6 percentage points) was 7% higher than the respective error of the polling average (1.5 percentage points).
3.3 Bias
Figure 3 addresses the question of potential biases in depicting the mean difference between the predicted and actual Democratic two-party vote for each method. Hence, values above the horizontal axis indicate that a method overpredicted the Democratic vote, while values below the horizontal axis suggest that the method overpredicted the Republican vote. For example, in 2016, the typical expert forecast overpredicted the Democratic vote by 1.7 percentage points, while the polling average overpredicted Democrats by 1.5 percentage points.
Experts overpredicted the Democratic vote (in 2004 and 2016) and the Republican vote (in 2008 and 2012) twice each. Interestingly, experts and polls erred in the same direction in each election. This may indicate that experts draw heavily on polls when making their forecasts. In three of the four elections (except for 2012), experts were more favorable for the Democrats than the polls. Across all forecasts, the polls showed virtually no bias, while the typical expert slightly overshot the Democratic vote by 0.2 percentage points.
4 Discussion
The MAE across all 452 experts’ vote share forecasts was 7% higher than a simple polling average from the same day. This is a small difference in accuracy, which certainly does not imply that one should ignore expert judgment when forecasting elections.
Trying to find the one best forecasting method is generally not a useful strategy for forecasting. Rather, the literature suggests combining forecasts from different methods. The reason is that a combined forecast includes more information than any single forecast, and the systematic and random errors associated with single forecasts tend to cancel out in the aggregate. This improves accuracy. If one uses the simple average as the means to combining, the combined forecast will at least be as accurate as the average error of the individual component forecasts, and often much more accurate (Reference Armstrong and ArmstrongArmstrong, 2001).
Experimental studies have shown that many people do not understand, and thus do not appreciate, the power of combining forecasts (Reference Larrick and SollLarrick & Soll, 2006). One reason is that people commonly think that they know which forecast is the best one and decide to go with it. But this is not a good approach to forecasting for several reasons. First, in picking a particular forecast, one may select a forecast that suits one’s biases (Reference Soll and LarrickSoll & Larrick, 2009). Second, it is extremely difficult, if not impossible, in most practical situations to know in advance which forecast will turn out to be most accurate. Past accuracy, for example, is not a good indicator for future accuracy. Two studies have found a negative relationship between the historical accuracy of a method (Reference Graefe, Armstrong, Jones, Cuzán, Cavari, Powell and MayerGraefe et al., 2017) or model (Reference Graefe, Küchenhoff, Stierle and RiedlGraefe, Küchenhoff, Stierle & Riedl, 2015) and its accuracy in predicting future elections. Third, even if one would know in advance which forecast will be most accurate, combining that forecast with less accurate forecasts can be useful. Reference Herzog and HertwigHerzog and Hertwig (2009) illustrate this – perhaps counterintuitive – finding for the simple case of combining two forecasts. The authors showed that the simple average of two forecasts is more accurate than the best single forecast if the two component forecasts bracket the outcome — i.e., the outcome is between the two forecasts — and if the error of the less accurate forecast does not exceed three times the error of the more accurate one. In other words, as long as adding a new forecast to the combination is likely to increase the chance that the range of forecasts bracket the true value, that forecast’s error can be quite large.
4.1 Combining forecasts from experts and polls
In the present study, the majority of expert forecasts (62%) were in the direction of the final election result. That is, there is a high chance that the expert forecasts and the polls bracket the true value. In such a situation, combining forecasts is likely to be useful. Figure 1 shows the MAE of a combined forecast of polls and individual expert forecasts. Across the four elections, the error of the combined forecast was 1.4 percentage points, which is 5% lower than the corresponding error of the polling average (1.5 percentage points), the best of the two methods. Compared to the expert forecasts (1.6 percentage points), the combined forecast reduced error by 12%. Also, note that even when the combined forecast did not provide the most accurate predictions, it helped avoiding large errors, such as the relatively large polling error in 2008.
As pointed out above, the results on the relative accuracy of polls and experts suggested that experts closely follow the polls (Figure 3). In other words, the two methods likely incorporate similar information. Combining, however, works particularly well if one combines forecasts that incorporate different information (Reference Graefe, Armstrong, Jones and CuzánGraefe, Armstrong, Jones & Cuzán, 2014). Hence, in order to improve upon the accuracy of forecasts from polls and experts, one should look for information that these methods may overlook, and incorporate that into the forecast.
4.2 Ignorance of election fundamentals
Figure 3 shows the mean errors of the fundamentals-based forecasts. The results reveal an interesting pattern. The fundamentals consistently overpredicted the Republican vote by substantial margins. In other words, in each of the past four elections, the Republicans underperformed relative to the fundamentals, and achieved less votes than what could be expected from historical data.
The results reveal mixed results for the performance of fundamentals relative to polls and experts. While in two of the four elections (2008 and 2012), all three methods erred in the same direction, the fundamentals pointed in the opposite direction from experts and polls in both 2004 and 2016. What is more, in both 2004 and 2016, the experts thought that the polls would underestimate the Democratic vote, even though the fundamentals pointed the other way. The results thus suggest that the experts did not sufficiently account for their fellow political scientists work on election fundamentals when making forecasts.
I can only speculate on the reasons for this behavior. For example, it may be that experts have little trust in these models, since they often incur large errors. Figure 2 shows the MAE of the fundamentals-based forecast in each election. In three of the four elections, the fundamentals were by far the least accurate method. Only in 2008 did the polls perform even worse. Across the four elections, the fundamentals-based forecast missed by 3.2 percentage points. This error is more than double the corresponding errors of polls and experts. The large error may lead experts to think that fundamental models are generally of limited value and thus to ignore them altogether. Such neglect would be unfortunate, however, if these models provide valuable information regarding the direction of the forecast error.
4.3 Combining polls, experts, and fundamentals
The last columns in Figure 2 show the results of a combined forecast of polls, individual experts, and the fundamentals-based forecast. The combined forecast was more accurate than both the typical expert forecast and the polling average in two of the four elections (2004 and 2016). Across the four elections, the combined forecast (MAE: 1.2 percentage points) reduced the errors of the polling average (1.5) and the typical expert (1.6) forecast by 19% and 24%, respectively.
Some readers may be puzzled by the fact that these large accuracy gains occurred despite adding a forecast to the ensemble that incurred an error that was more than twice the corresponding errors of polls and experts. The results thus provide further evidence that combining can be useful even in situations when one has strong evidence that a particular method will be most accurate. The key here is that the fundamentals-based forecast provided different information than both experts and polls, thus increasing the likelihood that the combined forecast would bracket the true value (Reference Graefe, Armstrong, Jones and CuzánGraefe et al., 2014).
4.4 Limitations
The analysis presented in this paper is based on a rather large sample of expert forecasts (N=452) collected over a 12-year period. That said, the generalizations that can be drawn are limited, since the data cover only the four U.S. presidential elections from 2004 to 2016. Further studies for different election types and in other countries are necessary to learn more about the relative predictive accuracy and potential biases of expert judgment in forecasting elections.
4.5 Conclusions
The present study provides evidence on the accuracy of expert judgment in forecasting elections. Although the majority of expert forecasts correctly predicted the directional error of polls, the error of a typical expert’s vote share forecast was on average 7% higher than a polling average. The results further suggest that experts ignored information captured by structural fundamental data available months before election day, which prior research found to be useful for election forecasting. Combining expert forecasts and polls with such a fundamentals-based reference class forecast reduced the error of polls and experts by 19% and 24%, respectively.
These large gains in accuracy are in line with prior research, which showed that reference class forecasts and base rates are one of the most effective tools for debiasing judgmental forecasts (Reference Chang, Chen, Mellers and TetlockChang, Chen, Mellers & Tetlock, 2016). Experts in any field should refrain from focusing too much on the specifics of a situation (“this time is different”) but also take the outside view (Reference Lovallo and KahnemanLovallo & Kahneman, 2003). In addition, they should be conservative about large changes and take into account all cumulative knowledge about a situation (Reference Armstrong, Green and GraefeArmstrong, Green & Graefe, 2015). A structured approach of combining forecasts from different methods that use different information provides a valuable and simple strategy to achieve that goal.