In democratic politics, politician gender may affect citizens’ prospective evaluations of candidates for elected office. Women politicians are often perceived to be “better types” than men politicians: voters believe them to be more likely to act in the public interest and less likely to behave in a self-serving fashion (see, for example, Alexander and Andersen, Reference Alexander and Andersen1993; Barnes and Beaulieu, Reference Barnes and Beaulieu2019).Footnote 1 As a result, voters may turn to women candidates in moments of crisis (for example, Piazza and Diaz, Reference Piazza and Diaz2020; Funk et al., Reference Funk, Hinojosa and Piscopo2021). A meta-analysis of candidate choice experiments suggests that women generally hold a slight competitive advantage over otherwise comparable men (Schwarz and Coppock, Reference Schwarz and Coppock2022).
We know less about how electorates retrospectively assess women politicians’ performance once they are in office. The predominant view is that positive beliefs about women’s competence and probity imply that voters expect more of women in office and are, therefore, more likely to punish women when they perform poorly. Recent literature provides evidence of higher performance standards for women candidates (Courtemanche and Connor Green, Reference Courtemanche and Connor Green2020) and of differential punishment for poor performance, including corruption and other scandals (Esarey and Schwindt-Bayer, Reference Esarey and Schwindt-Bayer2018; Barnes et al., Reference Barnes, Beaulieu and Saxton2020). An alternative argument suggests that electorates view men as having more agency and, therefore, greater responsibility for performance outcomes in general. This view implies that voters are more responsive to the performance of men politicians, both punishing them more for bad performance and rewarding them more for good performance (Costa, Reference Costa2021; de Geus et al., Reference de Geus, McAndrews, Loewen and Martin2021). Beliefs about politician agency might be especially relevant for the evaluation of politicians holding executive office and have implications for understanding the likelihood of differential re-election rates across genders (de Geus et al., Reference de Geus, McAndrews, Loewen and Martin2021).Footnote 2
In this article, we contribute to understanding how gender may affect citizen responses to politician performance by examining this question for politicians who hold executive office at the subnational level.Footnote 3 We run a survey experiment among an online sample of Argentine residents about a hypothetical local mayor. We manipulate the gender of the mayor and randomly assign respondents to receive information about that mayor’s performance in the distribution of a government food programme, signalling good or bad performance by describing the selection of beneficiaries as unbiased or biased, respectively. Follow-up questions elicit respondents’ voting intentions for the hypothetical mayor as well as evaluations of the mayor and the food programme.
We find evidence that men politicians, relative to women politicians, are punished more for poor performance and rewarded more for good performance in office. An analysis of further questions shows limited differences in the baseline expectations of politician performance or of interpretations of programme implementation by gender.Footnote 4 These findings are more consistent with the agentic perspective (de Geus et al., Reference de Geus, McAndrews, Loewen and Martin2021) than with the view that women politicians are held to higher standards compared to men. In addition, we show that respondents were highly attentive to the information about gender in the vignettes with women politicians. Studying subnational executive office holders, our results provide new evidence on how voters respond to politicians’ gender in a context where women politicians are rare.
Research Design
Context
We carry out empirical research in Argentina, which we consider a mixed case concerning the representation of women politicians. On the one hand, Argentina was the first country in the world to introduce gender quotas for legislative candidates in national elections (1991),Footnote 5 women’s representation in the national legislature is high, and the country had a woman as president from 2007 to 2015. At the same time, women’s representation in executive offices at other levels is limited. Argentina is a federal country with over 2,000 municipalities where only about 13 per cent of mayors are women, suggesting that women still face significant barriers to political success.Footnote 6
Vignette Experiment
Our experiment focuses on a hypothetical man or woman incumbent mayor’s role in implementing a food distribution programme, a widely recognized form of welfare distribution in Argentina.Footnote 7 We described that programme as either distributed fairly to those who really need it, or else as distributed in a biased fashion wherein individuals with connections to the municipality are favoured. We also included a control condition in which we provide no information about programme implementation. We consider performance on this dimension to be a valence issue: evidence shows that most Argentines prefer unbiased distribution of social welfare benefits (Weitz-Shapiro, Reference Weitz-Shapiro2014).
We used simple randomization, such that each respondent had the same probability of being assigned to any of the treatment conditions.Footnote 8 The vignette was shown to respondents on a series of screens with follow-up questions. Respondents were thus repeatedly exposed to the treatment to which they were assigned. Although not the focus of this study, the vignette also manipulated the mayor’s political party and whether the mayor’s name was included on a photo of a box of food from the programme referenced in the text.Footnote 9 The full text of the prompt is below.
Imagine a Peronist/PRO/[omit] mayor who is running for re-election this year. During his/her time in office, the mayor [man/woman, as indicated by Spanish language pronoun] carried out a programme to help poor people, which consisted of the distribution of boxes of food, as shown in the photo. Programme beneficiaries are strictly selected based on need/theoretically selected based on need. In practice, those with contacts inside the municipality receive priority /[omit].Footnote 10
We recruited N=2,040 respondents from Netquest’s online panel in Argentina in March 2021. The sample was designed to closely mirror the composition of the national population in terms of gender, age, region, and socioeconomic status.Footnote 11 We implemented the survey in Qualtrics; respondents could complete it on a computer, tablet, or smartphone. Table 1 summarizes the assignment to the relevant manipulations.
Table 1. Vignette Experiment Research Design

Note: Each cell shows the number of respondents assigned to that combination of treatments.
Respondents Take Note of Women Mayors
Before turning to the results, we provide evidence of respondent attentiveness to the mayor’s gender. All respondents had the opportunity to learn about the mayor’s gender in the text of the vignette. The first sentence of the vignette asks respondents to imagine a mayor: “Imagine un intendente” if they are assigned to a vignette about a man mayor or “Imagine una intendenta ” if they are assigned to a vignette about a woman mayor.Footnote 12
As a manipulation check, immediately after measuring the outcome variables, we asked respondents whether they remembered the gender of the mayor who was mentioned, with possible responses of: “man”, “woman”, “that information was not provided”, or “I don’t recall”. Information about a woman mayor was substantially more noteworthy to respondents. Table 2 shows the distribution of responses to this question by treatment combination. Among respondents assigned to a vignette with a man mayor, 51 per cent replied that they recalled a man mayor, 39 per cent said that they were not given this information, and 9 per cent of respondents said that they could not recall. Only 1 per cent of respondents incorrectly recalled a woman mayor. For respondents assigned to a vignette with a woman mayor, 91 per cent correctly reported learning about a woman mayor. Four per cent incorrectly recalled a man mayor, 2.4 per cent reported not receiving this information, and 3 per cent said that they did not know.Footnote 13
Table 2. Recall of Mayor’s Gender in Vignette

These results establish that respondents were attentive to the gender of the mayor in the vignette. This gives us confidence that respondents assigned to the woman-mayor condition were thinking about a woman mayor when they answered the outcome questions. It is also a striking descriptive finding about how notable Argentines find women mayors.Footnote 14
Results
We asked respondents two questions to assess the electoral impact of the information in the vignette.Footnote 15 The first asked the respondents their likelihood of voting for the hypothetical mayor in the next election. The second asked whether the respondents believed that the food programme would help the mayor secure re-election.Footnote 16 Figure 1 shows the results for these two outcomes. The left panels show the mean response for each combination of mayor gender and programme implementation; the right panels show the differences in means across conditions.Footnote 17

Figure 1. Means by treatment condition and differences in means for electoral performance outcomes.
We first compare vote intention among respondents who learned about biased implementation to vote intention among respondents in the control group who did not receive any information about implementation. Among respondents who learned about a man mayor, information about biased implementation reduced the vote intention by 0.12 points (p = 0.08), compared to a reduction of only 0.04 points for women mayors (p = 0.58). The punishment for men mayors is three times as great, although the difference between these point estimates is not statistically significant (p = 0.43).
Next, we examine the effects of information about unbiased implementation compared to the control condition. Men mayors described as implementing the programme in an unbiased fashion receive a 0.21-point increase in vote intention on the four-point scale (p < 0.01). The difference for women mayors is a 0.06-point increase (p = 0.45). The difference between these two estimates is substantively large (men receive rewards almost four times as large a benefit as women), but not statistically significant at conventional levels (p = 0.13).
These effect sizes are of meaningful magnitudes. As can be seen in the Online Appendix Table B10, the existence of a match between the mayor’s party in the vignette and the respondent’s self-reported partisan preference is associated with a 0.50-point increase in expressed vote intention (p < 0.01). The effect size of describing unbiased implementation in vignettes with men mayors is therefore about 42 per cent as large as this central determinant of voting behaviour, and the effect size of describing biased information in vignettes with men mayors is about 21 per cent as large.
Taken together, these results are contrary to findings that voters punish women more harshly for poor performance (Esarey and Schwindt-Bayer, Reference Esarey and Schwindt-Bayer2018; Barnes et al., Reference Barnes, Beaulieu and Saxton2020). They are instead consistent with the hypotheses and results in de Geus et al. (Reference de Geus, McAndrews, Loewen and Martin2021), wherein voters perceive greater agency among men politicians and are more likely to both punish them for bad performance and reward them for good performance compared to women politicians.
We also examine the differences between the two treatment conditions – unbiased versus biased programme implementation. Relative to the biased implementation condition, unbiased implementation is associated with a 0.34-point increase in the likelihood that the respondent will vote for the man mayor (p < 0.01) but with only a 0.10-point increase that a respondent will vote for the woman mayor (p = 0.19). This 0.24-point difference-in-differences is statistically significant at the 95 per cent confidence level, again indicating that survey respondents were more responsive to performance information provided about men mayors.
Turning to the second question, which asks respondents whether the food programme will help the mayor win re-election, the different treatment conditions elicit more limited differences in patterns of responses. For both men and women mayors, information about either biased or unbiased implementation increases respondents’ perceptions that the programme will be electorally valuable compared to the control. For men mayors, the 0.11-point increase in the biased-implementation condition is marginally significant (p = 0.08), whereas the 0.07-point increase in the unbiased-implementation condition is not (p = 0.23). For women mayors, the treatment effects are small in both conditions; the differences are not statistically significant, nor are they statistically distinguishable from the differences observed among men mayors. The pattern of responses suggests that voters think biased implementation may be electorally valuable, even if they themselves react negatively to it.
It is possible that the existence of differential punishment and/or rewards will vary by the gender of the citizen assessing performance (see, for instance, Costa and Schaffner, Reference Costa and Schaffner2018; Schwarz and Coppock, Reference Schwarz and Coppock2022). For example, if women have even higher expectations for women politicians than men do, women could be harsher when punishing womenpoliticians for not meeting expectations and are less likely to reward them for good performance. We find no evidence of this in our data.Footnote 18
Do Differential Reactions Originate in Different Baseline Expectations?
The section above establishes that respondents react more strongly to performance information for men mayors than for women mayors. We find evidence both of differential punishment (men mayors are punished more for biased implementation) and of differential rewards (menmayors are rewarded more for unbiased implementation). Is this a result of different baseline preferences for mayors of different genders? Our evidence suggests not. In the control condition, respondents are equally likely to say that they would vote to re-elect a man or woman mayor (1.93 versus 1.95; p = 0.73) and equally likely to say that the programme will help the mayor win re-election (2.97 vs. 3.00; p = 0.66). Furthermore, if there were different baseline expectations across genders, this should lead to either greater rewards or greater punishment for men mayors, but not both.
Other questions from the survey also suggest that Argentine respondents view men and women mayors similarly. After reading the vignette and after the measurement of the voting outcomes, respondents assessed whether the hypothetical mayor was likely to have engaged in corruption, patronage, or vote buying. Figure 2 shows the distribution of responses across politician gender for respondents in the control group. Although respondents are somewhat more likely to say that it is “very likely” that men mayors are corrupt, use patronage, or engage in vote buying, both difference-in-means tests and chi-squared tests of differences across the whole distributions return insignificant results. When they do not receive information about politician performance, respondents believe that men and women mayors are equally likely to engage in illicit behaviours.

Figure 2. Perceptions of corruption, patronage, and vote-buying by mayor gender in the control group.
Note: p-values from χ2 tests: p = 0.27 for the corruption outcome; p = 0.40 for the patronage outcome; p = 0.60 for the vote buying outcome.
Does Performance Information Differentially Change Programme Perceptions?
Another possible explanation for our findings is that the information about programme implementation might lead respondents to update their perceptions of the described social welfare programme in different ways for men and women mayors. To examine this possibility, we explore questions that ask respondents if they would be satisfied with the programme if it were run in their municipality and whether they believe the programme was distributed to those most in need.
Figure 3 shows the results for these outcomes. The top panels provide some evidence of differences across genders.Footnote 19 As compared to the control condition, biased implementation decreases respondent satisfaction by about 0.13 points for both men and women mayors (p = 0.09 for men; p = 0.07 for women). By contrast, information about unbiased implementation is somewhat more meaningful for men mayors. Compared to the control condition, unbiased implementation by men mayors leads, on average, to a 0.27-point increase in programme satisfaction (p < 0.01). This difference is more than twice the size of the difference observed for woman mayors (0.13, p = 0.11 for the test of the null hypothesis of no difference from the control condition), although the two differences are not statistically distinguishable from one another (p = 0.21). We also see that the comparisons across the unbiased and biased implementation conditions are significant for both men and women mayors; the difference is larger for men mayors but not statistically distinguishable from the difference estimated for women mayors (p = 0.25).

Figure 3. Means by treatment condition and differences in means for programme satisfaction outcomes.
While this pattern might provide some insight into why men mayors benefit more from information about unbiased implementation, it does not help us understand why men mayors also appear to be punished more for biased implementation. Instead, our results seem most consistent with the argument that voters are more likely to view men mayors – especially those holding executive office – as responsible for outcomes under their watch, whether positive or negative (De Geus et al. Reference de Geus, McAndrews, Loewen and Martin2021).
In the bottom panels, we show that information about biased implementation reduces perceptions that the programme will benefit those most in need, for both men and women mayors. The effect for women mayors is slightly larger and is significant at the 90 per cent confidence level. The effects of this treatment across genders, however, are not distinguishable from one another (p = 0.83). Conversely, information about unbiased implementation appears to have no effect on beliefs about whether the programme benefits those in need, for mayors of either gender.
Finally, we explore whether implementation information changes perceptions of other mayor characteristics. Online Appendix Table B2 shows no evidence of an effect of performance information on whether the respondent thinks the mayor is likely to have engaged in corruption, patronage, or vote buying.
Discussion
Existing research presents varying findings about how voters evaluate candidates and office-holders of different genders. The predominant view in the literature suggests that the public tends to hold women politicians to higher standards, such that they face more punishment for failing to deliver in office or becoming embroiled in scandal. An alternative argument suggests that voters credit men with more agency and provides evidence that voters are more responsive to men’s performance in office – whether good or bad.
We contribute to this literature through a study of whether Argentine respondents’ reactions to information about social welfare programme implementation at the local level vary with the gender of the local executive. Consistent with the agentic perspective, we find evidence that men mayors are rewarded more strongly for good programme implementation, and punished more strongly for poor implementation. Although the cross-gender differences are not significant for either individual treatment, the total difference between the two treatments is significantly greater for men mayors than for women mayors. We also show, contrary to much existing literature, that our respondents do not have higher baseline expectations for women mayors, so differing baseline expectations cannot explain the pattern of results we document.
Finally, our analysis also shows that respondents have a very high recall of the mayor’s gender when they read about a woman mayor. This suggests that, in a context where women mayors are rare, respondents’ focus on a woman mayor’s gender identity might diminish their attention to performance information compared to when the mayor is a man. Future research designs might explore this possibility by comparing responsiveness to performance information across genders in contexts with different shares of women politicians.
Supplementary material
For supplementary material accompanying this paper visit https://doi.org/10.1017/S0007123424000668
Data availability statement
Replication data for this article can be found in Harvard Dataverse at: https://doi.org/10.7910/DVN/SWIPBZ.
Acknowledgements
Thanks to Jair Moreira and Anik Willig for excellent research assistance and to Juan Manuel del Mármol for his help editing the photo used in the survey experiment. A previous version of this article was presented at the 2022 Midwest Political Science Association Meeting in Chicago and the 2022 Annual Meeting of the American Political Science Association in Montreal. We thank Cesi Cruz, Natalia Bueno, and the audiences at those meetings for their excellent comments.
Financial support
This research was funded by Brown University.
Competing interests
None.
Ethical standards
This research was determined exempt by Institutional Review Boards (IRB) at Brown University (Protocol ID 2010002827), Tulane University (Protocol ID 2020- 1783), and the University of Illinois (Protocol Number 21332).