Since the publication of Enns and Koch’s (Reference Enns and Koch2013) paper introducing a measure of state policy mood constructed using multilevel regression and poststratification (MRP) and survey aggregation, we have debated with Enns and Koch (hereafter E&K) about the validity of their measure of policy mood and of our own measure (Berry et al. Reference Berry, Ringquist, Fording and Hanson1998 [hereafter BRFH]). The debate has unfolded in two articles: Berry et al. (Reference Berry, Ringquist, Fording and Hanson2015), followed by Enns and Koch (Reference Enns and Koch2015). Our first purpose is to continue the debate. However, since much has already been written by both sets of authors, we limit our response to Enns and Koch (Reference Enns and Koch2015) to assessing two arguments they raise that they did not make in their 2013 piece, and that we believe are flawed. Our second purpose is to report replication analysis that sheds light on the extent to which the substantive findings of research vary depending on one’s choice about which measure of state policy mood to use; we demonstrate that this measurement choice can be highly consequential, with substantive conclusions changing dramatically with a shift in the measure used.
Contrary to E&K’s Claim, Public Opinion Data Indicate that Policy Mood has been Liberalizing in the South
Enns and Koch (Reference Enns and Koch2015) consider the longitudinal characteristics of the two state policy mood measures in the South. Their figure 2a shows that the BRFH measure indicates that policy mood has become more liberal in southern states in recent decades, and that the E&K measure shows mood in the South has become more conservative. E&K claim that their measure has greater face validity than the BRFH measure because of two trends that they believe indicate that policy mood in the South has become more conservative: in this region, (i) partisan affiliation has become “increasingly Republican” (p. 441) and (ii) welfare benefits have declined (their figure 2b). The fact that political scientists lack a measure of state policy mood known with certainty to be valid means that we cannot know for sure how true policy mood has changed in the South, and thus, that we cannot definitively evaluate E&K’s claim. However, there are two important reasons why we should question E&K’s claim.
First, the methodology employed by E&K to construct their measure—MRP—is not well-suited for estimating change in public opinion over time. MRP was introduced by Gelman and Little (Reference Gelman and Little1997) and has become increasingly popular among state politics scholars due to the dearth of reliable state-level public opinion data. The method was originally designed to estimate public opinion for a single period and its validity as a cross-sectional technique has been supported by several studies (Buttice and Highton Reference Buttice and Highton2013; Lax and Phillips Reference Lax and Phillips2009; Park, Gelman, and Bafumi Reference Park, Gelman and Bafumi2004; Warshaw and Rodden Reference Warshaw and Rodden2012). Although many scholars (in addition to E&K) have used the MRP approach to create longitudinal estimates of state public opinion, Gelman et al. (Reference Gelman, Lax, Phillips, Gabry and Trangucci2018, 2) question the validity of such measures due to the fact that the original MRP approach “fails to make use of all the available data and employs arbitrary assumptions as to how much change occurs over time.”
Second, E&K’s claim that policy mood in the South has become more conservative rests on the assumption that partisanship and welfare benefits are strong proxies for policy mood. At best, this assumption is dubious. Although partisanship and ideological self-identification (i.e., symbolic ideology) have followed similar trends over the last several decades, this is not the case for partisanship and policy mood. Indeed, upon comparing trends in macropartisanship and policy mood, Erikson (Reference Erikson2012, 42) concluded that “the two time series are virtually uncorrelated.” And although welfare benefits certainly have steadily declined in the South since the early 1970s, several studies have shown that this has been the case in all states for reasons that have little to do with policy mood (Berry, Fording, and Hanson Reference Berry, Fording and Hanson2003; Peterson and Rom Reference Peterson and Rom1990; Soss, Fording, and Schram Reference Soss, Fording and Schram2011). A much better way to assess the plausibility of the BRFH measure-derived finding of a liberalizing trend in policy mood in the South is to observe how public opinion about ideologically relevant issues has changed over time in the South, thereby relying on information about public opinion to directly assess the public’s policy mood.
We obtained General Social Survey (GSS) data from 1973 to 2010 on a set of eight items asking respondents if the government is “spending too much money,” “too little money,” or “about the right amount” across a diverse set of program areas: welfare, healthcare, education, improving the conditions of Blacks, environmental protection, crime, defense, and foreign aid.Footnote 1 We also secured 10 GSS items concerning issues generally thought to be related to ideology: abortion, gay rights, gun control, aid for blacks, the treatment of criminals, government redistribution, tax policy, and healthcare for the poor.Footnote 2 For each of the 18 items, the scores for responses were linearly transformed to range between 1 and 3, with higher scores indicating greater liberalism. In a set of figures, we analyze trends in the South in each item, as well as in four policy mood indexes we construct primarily from these items.Footnote 3
For each of the eight items based on government spending preferences, Figure 1 presents a plot of the average score for the item among respondents from the South against the year of observation—overlaid with the ordinary least squares (OLS) bivariate regression line, and the correlation between the average item score and a year-count variable. A positive correlation indicates that the South is liberalizing over the period of analysis, and a negative correlation implies that the South is becoming more conservative. Figure 2 presents similar plots based on the 10 additional GSS items that are not about spending.
The trends displayed in Figures 1 and 2 vary substantially in strength across the 18 GSS items, but for the great majority (14) of the 18 plots, the overall trend is increasingly liberal policy mood over time, as reflected in a positive correlation between the opinion item and the year of observation. Moreover, the positive correlation is statistically significant (0.05 level, two-tailed test) in 10 of the 14 plots. In contrast, for only one item—support for “government helping the poor” is the correlation negative and statistically significant.
We also constructed four indexes of policy mood based on alternative conceptualizations of ideology. These indexes were based on (i) all 18 items in Figures 1 and 2 (“Policy Liberalism Index”), (ii) the 8 spending items presented in Figure 1 (“Support for Spending Index”), (iii) the 10 nonspending items in Figure 2 (“NonSpending Items Index”), and (iv) 11 items measuring attitudes about the scope of government used by Stimson (Reference Stimson1991) to construct his measure of policy mood (“Stimson Items Index”).Footnote 4 Each of the indexes was constructed based on the unweighted average of its component GSS items, each of which were standardized (linearly transformed to have a mean of 0 and a variance of 1) prior to creating the index.Footnote 5
Plots of the four policy mood indexes over time are presented in Figure 3. In every case, the policy mood index is positively correlated with time, indicating increasing southern liberalism over time. Across the four plots, the year-mood correlations range from a low of 0.32 (Stimson Items Index) to 0.70 (Support for Spending Index), and are statistically significant at the 0.05 level in all but one case (Stimson Items Index).Footnote 6 These results are consistent with the southern trend in BRFH’s policy mood measure, but inconsistent with the trend in the South in E&K’s measure of mood.Footnote 7 Thus, our empirical evidence leads us to reject E&K’s assertion that BRFH scores for southern states lack face validity; to the contrary BRFH scores conform nicely to available evidence about changes in the South in public opinion, while E&K scores do not.
Contrary to E&K’s Claim, Cross-Sectional Performance of the State Policy Mood Measures is Relevant
In our 2015 SPPQ paper, we argue that Enns and Koch’s (Reference Enns and Koch2013) state policy mood scores lack face validity in cross-sectional comparisons. In their reply, Enns and Koch (Reference Enns and Koch2015) do not challenge this claim. Rather, they question the relevance of the cross-sectional performance of their measure based on an argument that the “standard approach of including state fixed effects in cross-sectional time-series models…means that most analyses focus explicitly on over-time (within state) relationships” [emphasis added] (Enns and Koch Reference Enns and Koch2015, 440). We believe this argument misses the point: it is inappropriate to dismiss cross-sectional characteristics of the E&K measure that lack face validity as irrelevant even if one uses the E&K measure solely for pooled cross-sectional time-series analyses specifying state fixed effects (so that the only relevant variation is longitudinal). Our contention is premised on the fact that we see no reason to believe that E&K’s methodology could yield measurement error sufficient to invalidate cross-sectional comparisons without simultaneously invalidating longitudinal comparisons.
Consider the following thought experiment. Assume that we know the true value of policy mood in each state in each year over a long period. Denote this true value in state s in year t (t = 1, 2,…, T) by TrueMoods,t. We create an imperfect measure of policy mood, ObservedMoods,t , by introducing systematic (i.e., nonrandom) measurement error. Specifically, for each state-year, we adjust the true score (some up, some down) by an amount sufficient to substantially distort cross-sectional comparisons of policy mood in each year. Denote the amount of the adjustment to the value of true mood in state s in year t—that is, the amount of error in observing TrueMoods,t —by Errors,t. Given this notation,
Consider the case in which for each state, the amount of error is stable across years, that is, for each state, s, Error s,1 = Error s,2 = … = Error s,T. In this special case, the measurement error introduced would not distort longitudinal comparisons in any state. This is because for any state, s, and any two years, t1 and t2,
However, with any other pattern of measurement error—that is, with any departure from error that is stable across years—the measurement error would distort not only within-year cross-sectional comparisons of mood, but also within-state longitudinal comparisons.
Without knowing the exact nature of the measurement error in E&K mood scores that produces what we have contended are distorted cross-sectional comparisons (a contention E&K have not disputed), one cannot know with certainty whether this measurement error would also distort longitudinal comparisons. However, it seems implausible to us that the error in E&K’s measure would be stable across the more than 50 years E&K have observed—which we have shown is the only condition under which longitudinal comparisons of E&K scores would be shielded from distortion.Footnote 8 This would imply that even if one cares only about longitudinal variation in state policy mood, one cannot dismiss evidence of poor cross-sectional performance of the E&K measure because its poor cross-sectional performance signals the presence of measurement error that is likely to undermine longitudinal comparisons as well.Footnote 9
Conclusion about Enns and Koch’s and Berry et al.’s Measures of Policy Mood
Nothing in the response by Enns and Koch (Reference Enns and Koch2015) to our SPPQ paper (Berry et al. Reference Berry, Ringquist, Fording and Hanson2015) leads us to retract any of our arguments; we continue to stand behind the claims in our paper. We remain doubtful that E&K’s measure is valid, largely because its characterization of mood in the states departs substantially from conventional wisdom and current scholarship; and we continue to believe that the Berry et al. (Reference Berry, Ringquist, Fording and Hanson1998) indicator is a reasonable proxy for policy mood that fares well on a variety of reliability, face validity and construct validity tests described in this article and in previous papers (Berry et al. Reference Berry, Ringquist, Fording and Hanson1998; Reference Berry, Ringquist, Fording and Hanson2007; Reference Berry, Ringquist, Fording and Hanson2015). As a consequence, we think the BRFH measure can serve the needs of state politics scholars until a superior measure based on public opinion surveys is developed.Footnote 10
The Implications of the Choice about what Measure of State Policy Mood to Use on Research Results
Across all state-years in which both the BRFH and E&K measures of state policy mood are available (observations for each year during the period 1960–2010), the correlation between the two measures is just 0.10 (Berry et al. Reference Berry, Ringquist, Fording and Hanson2015, 2).Footnote 11 This suggests a strong possibility that a researcher estimating a model including one of the two measures as an independent variable would often derive substantially different results about the effect of policy mood if she used the other measure instead. Also, to the extent that other independent variables in a model are correlated with at least one of the measures of policy mood, the estimated effects of these other variables may also be sensitive to the choice about which measure of policy mood to use.
In this section, we report the results of replications of several published studies to empirically assess the extent to which results from models including state policy mood as an independent variable vary depending on the measure of mood employed.Footnote 12 To identify a sample of studies to replicate, we used the ISI Web of Knowledge search mechanism to identify each article (i) published between 2013Footnote 13 and 2019 in a “political science” journal with a 2017 JCR Impact Factor of at least 1.0Footnote 14 and (ii) that cities one of the papers introducing the two policy mood measures: Berry et al. (Reference Berry, Ringquist, Fording and Hanson1998) or Enns and Koch (Reference Enns and Koch2013). This search yielded 99 articles. One of us visually scanned each of these articles to identify the subset that report empirical analysis in which one of the two measures of state policy mood is used as an independent variable in an econometric model.Footnote 15 On practicality grounds, we restricted our analysis to articles for which replication data were publicly available, and for which executing author-provided code allowed us to reproduce published results.Footnote 16 This winnowed our sample to seven articles. To avoid artificially inflating the number of distinct models we replicate by including some models that are minor “tweaks” of another, when an article estimated multiple models including state policy mood, we randomly chose one model for replication.Footnote 17
For each of the seven models to be replicated, using the authors’ data—and minimally changing their Stata or R code—we re-estimated the model twice, once using the BRFH measure of policy mood, and once using the E&K measure of mood.Footnote 18 For five of the seven models, the finding about the effect of policy mood varies substantially with the measure of mood employed. As can be seen in Table 1, for one of these five models (Hayes and Dennis), the coefficient for policy mood is statistically significant at the 0.05 level in both the BRFH version of the model and the E&K version, but positive in one and negative in the other. In three other models (Boehmke and Shipan Reference Boehmke, Osborn and Schilling2015; Hawes and McCrea Reference Hawes and McCrea2018; Ojeda et al. Reference Ojeda, Whitesell, Berkman and Plutzer2019), the coefficient for mood is statistically significant at the 0.05 level in one version of the model, and far from significant in the other (with a p-value greater than 0.80).Footnote 19 In a fifth model (Boehmke, Osborn, and Schilling Reference Boehmke, Osborn and Schilling2015), the coefficient for mood is positive and significant at the 0.10 level in one version, and weakly negative in the other. In the remaining two models, the difference in results across versions is less stark. In the Hanna and Mallinson model, there is a positive coefficient for mood in both versions, but the p-values are nontrivially different (0.07 and 0.36); in the Taylor model, the coefficient for mood is positive for one measure of ideology and negative for the other, but neither close to statistical significance at the 0.05 level (with p-values of 0.52 and 0.59).
Abbreviations: BRFH, Berry et al.; E&K, Enns and Koch.
a If an article’s author(s) offer an explicit hypothesis about the direction of the effect of policy mood on the dependent variable, the predicted direction (+ or –) is enclosed in parentheses after the dependent variable listed below.
** Statistically significant at the conventional threshold (p < 0.05) in political science research.
* Not statistically significant at the conventional level, but would be significant using a slightly higher threshold (p < 0.10).
We can also consider whether the choice of the measure for policy mood affects the coefficient estimates for other independent variables in a model. There are 70 non-mood independent variables in the seven replicated models together.Footnote 20 As Table S1 in the Supplementary Material shows, in the vast majority of cases (i.e., the 61 rows that are not shaded in gray), the coefficient for a variable is either (i) statistically significant at the 0.05 level with the same sign in both versions of the model or (ii) statistically insignificant at the 0.05 level in both versions. However, in two or these 61 cases (see rows 3 and 30 of Table S1), one coefficient is statistically significant at the 0.10 level and the other is far from statistically significant (with a p-value of 0.45 or 0.90). On the other hand, there are also five cases (see rows 27, 31, 64, 66, and 68) among the nine shaded rows in which a variable’s coefficient is statistically significant at the 0.05 level in one version of the model, and nearly significant in the other version. In three other cases (see rows 49, 52, and 53), a variable’s coefficient is statistically significant at the 0.05 level in one version and not even close to significant in the other (with a p-value of 0.98, 0.99, or 0.49, respectively). The most striking difference in results is a case (see row 46) in which the coefficient for a variable is statistically significant at the 0.05 level in both versions of the model, but is positive in one and negative in the other.Footnote 21
Since we replicate just seven studies, our sample cannot be assumed to be representative of the universe of research in which state policy mood has been used an independent variable. On the other hand, we chose the studies using a procedure that guarantees that they were not “cherry picked” to produce results of one kind or another. We believe that it is evident from our replications that the decision about which measure of state policy mood to use when doing research should not be made casually. There is clearly a substantial risk that one’s choice about which measure of policy mood to use will have a large impact on one’s finding about the effect of policy mood. There is also at least a small risk that one’s choice of mood measure will affect one’s findings about the effects of other variables included in one’s model. Thus, even if policy mood is being employed solely as a control variable that allows one to derive an unbiased estimate of the effect of some other variable, one cannot safely assume that the choice about how to measure mood is inconsequential.Footnote 22
Supplementary Materials
To view supplementary material for this article, please visit http://doi.org/10.1017/spq.2021.23.
Data Availability Statement
Replication materials are available on SPPQ Dataverse at https://doi.org/10.15139/S3/CQTBQU (Berry et al. Reference Berry, Fording, Hanson and Crofoot2021).
Funding Statement
The authors received no financial support for the research, authorship, and/or publication of this article.
Conflict of Interest
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.