Research on the numerical or descriptive representation of social groups in elected political institutions has experienced a renaissance in political science in recent decades (for example, Carnes and Lupu Reference Carnes and Lupu2023; Dal Bó et al. Reference Dal Bó, Finan and Fokle2017; Gulzar Reference Gulzar2021; Krcmaric, Nelson, and Roberts Reference Krcmaric, Nelson and Roberts2020; Wängnerud Reference Wängnerud2009). Yet work on the topic is often hampered by a fundamental problem: data availability. For many times, places, and institutions, scholars have not collected data on the characteristics of politicians in formats usable for academic research.
In this article, we describe a new cross-national dataset that provides the most detailed and comprehensive data ever made available on the characteristics of national legislators in the world’s democracies (Carnes et al. Reference Carnes, Golden and Lupu2025). The Global Legislators Database (GLD) covers members of the lower (or unicameral) chamber in 97 national legislatures, representing almost all of the world’s 103 electoral democracies with more than 300,000 residents. It includes information on 19,704 lawmakers who held office during one legislative session in 2015, 2016, or 2017 in each country. For each officeholder, we have compiled information on characteristics that include party affiliation, gender, education, age, and previous occupation.Footnote 1
The GLD can be used to answer a wide range of important research questions. It can be used to study whether a legislature’s social class composition reflects the makeup of the country, or whether the composition of parliaments varies with national characteristics such as the level of economic development or regime type. Because the dataset provides information on individual legislators, it can be used to study differences across representatives (for example, do lawmakers with more formal education behave differently than those with less formal education?), political parties (for example, do rightwing parties elect fewer women than leftwing parties?), countries, or regions (for example, are legislators older in Europe than in Latin America?).
This article summarizes the key features of the GLD, presents three validation exercises, and reports the results of three applications of the dataset that address important research questions: (1) whether re-election rates vary by gender, education, and social class; (2) whether campaign finance regulations are associated with the number of legislators who come from working-class occupational backgrounds; and (3) whether countries with stronger rule of law also elect larger shares of lawyers to national legislatures. These analyses reveal previously unknown patterns that raise interesting questions for further research.
The Global Legislators Database
We began building the GLD by identifying the 103 countries with populations over 300,000 that Freedom House defined as electoral democracies in 2016. We used the 300,000 threshold since it was difficult to collect data on smaller countries.Footnote 2 During the data collection, it became clear that we could not obtain reliable and complete lists of legislators for three countries (Indonesia, Nepal, and Niger) and we could not locate education and/or occupation data for at least 90 per cent of legislators in another three (Comoros, Malawi, and Sri Lanka).Footnote 3 The final GLD includes the remaining 97 democracies. If a country had multiple legislative sessions between 2015 and 2017, we selected one at random.
We include legislators who were elected in the general election to their country’s national parliament or, in countries with bicameral legislatures, to the lower chamber. We focus on the lower chamber because upper chambers often include hereditary, appointed, or indirectly elected members and we wanted to compile data on individuals whose elections reflected the choices of voters. Lower chambers are also more comparable across countries given the enormous variation in the policymaking powers of upper chambers. Because we focus on legislators elected in general election races, the dataset does not include legislators who were appointed or who replaced other lawmakers mid-cycle and it does not include substitute, alternate, or deputy legislators.
The variables included in the GLD are the legislator’s name, date of birth, gender, political party affiliation at the time of the election, last occupation prior to being first elected to public office, and level of education attained prior to their current term in office. The dataset also includes relevant country-level variables, such as the year of the legislative election, the range of years the legislature was in session, the legislature number (for countries that number their legislatures), the total number of legislators in the chamber, the number of legislators from that country included in the GLD, and the date that our team performed a final verification of the country data. Finally, the GLD includes extensive sourcing information for each country and many individual legislators, making the dataset as transparent and reproducible as possible.Footnote 4
Our goal was to eliminate missingness and to create the most exhaustive and accurate cross-national dataset possible. The names of legislators were checked against official parliamentary lists. In the few countries for which we could not locate a canonical list of elected legislators for the selected term, we triangulated against other sources, including domestic election authorities. As a result, there are only 10 countries with discrepancies between the numbers of legislators recorded in the GLD and the number of seats in parliament, totalling 26 missing legislators out of 19,730 – a successful inclusion rate of 99.9 per cent.Footnote 5
In addition to conducting online searches, our research assistants painstakingly contacted parliamentary offices, individual legislators, and country experts to collect data. Thanks to these efforts, we have information on the date of birth for 90.6 per cent of included legislators, gender for 99.5 per cent, occupational data for 93.6 per cent, and educational data for 90.1 per cent.
Two variables were especially challenging to collect in a usable fashion. The first was education; for each legislator, we determined the highest degree they completed before being elected to the parliamentary term we selected. To find accurate and precise information, we often had to consult multiple sources, particularly to determine whether a legislator had completed or only begun a degree. To reconcile degrees across countries, we used the widely accepted International Standard Classification of Education (ISCED).Footnote 6
A second variable that required extensive work was occupation, a notoriously thorny variable in the study of descriptive representation. We set out to record each legislator’s primary paid unelected job prior to their first elected office (not including elected positions in government and, to the extent possible, also ignoring elected positions in political party or trade union leadership, political patronage positions, and political appointments). That is, we aimed to record the last main occupation each legislator held before they got elected or appointed to political office. After we collected raw occupational descriptions, we then coded them into three-digit occupational codes based on the International Labor Organization’s International Standard Classification of Occupations (ISCO-08), the most widely accepted occupational classification system. To do so, we mapped raw occupational information (for example, ‘industrial engineer’, ‘accountant,’ ‘solicitor’, and ‘sales manager of construction materials business’) onto ISCO codes using the University of Warwick’s Computer Assisted Structured Coding Tool (CASCOT). We then manually reviewed the output and made corrections as needed. The final dataset includes both our coded occupational data (to allow users to carry out off-the-shelf analyses of the economic backgrounds of legislators) and the original open-ended occupational descriptions we collected (to allow validation of our team’s coding and easily permit alternative coding).Footnote 7
The GLD represents a significant contribution to existing cross-national data on the personal characteristics of politicians. Most datasets that include biographical information about politicians focus on heads of state (Baturo Reference Baturo2016; Brambor, Lindvall, and Stjernquist Reference Brambor, Lindvall and Stjernquist2014; Ellis, Horowitz, and Stam Reference Ellis, Horowitz and Stam2015; Goemans, Gleditsch, and Chiozza Reference Goemans, Gleditsch and Chiozza2009) or cabinet members (Alexiadou Reference Alexiadou2016; Best and Edinger Reference Best and Edinger2005; Braun and Raddatz Reference Braun and Raddatz2010; Ennser-Jedenastik et al. Reference Ennser-Jedenastik, Ecormier-Nocca and Hewyler2022). Of the few that collect data on legislators, some include only a selection of OECD countries (Best and Edinger Reference Best and Edinger2005; Dowding and Dumont, Reference Dowding and Dumont2009; Faccio Reference Faccio2006; Faccio Reference Faccio2010; Göbel and Munzert Reference Göbel and Munzert2022) while others include more expansive lists of countries but only subsets of lawmakers (Nelson Reference Nelson2014). Other efforts, such as the Global Data on National Parliaments (PARLINE), available through the Inter-Parliamentary Union, provide aggregated data on some demographic characteristics of legislators but not individual-level data (see also Ruedin Reference Ruedin2009). We know of no other dataset that provides virtually complete individual-level biographical data on lawmakers for such a large sample of democracies.
The main drawback of the GLD is that it represents a snapshot at a single point in time. Unfortunately, it would not have been possible to collect historical data for many countries in the dataset. Even if we could have assembled accurate lists of the names of legislators serving in earlier legislatures – not a given for many countries – there would have been particularly significant missingness for occupational and educational characteristics. For this reason, the GLD should be thought of as a baseline dataset. The codebook includes extremely detailed data collection information to allow researchers to replicate the data collection process and expand the GLD for future years.
Validity Checks and Comparisons to Other Datasets
To assess the quality of the GLD, we began by conducting validity checks. Unfortunately, for many of the traits recorded in the GLD, there are no other large-scale cross-national datasets that we can use as benchmarks for validation. (This is a principal contribution of the GLD.) However, there was one trait in the GLD for which other sources provide comparable data: gender. The Varieties of Democracy (V-Dem) project, for instance, is a widely-used country-level dataset that compiles information from experts and other sources on 202 nations (Coppedge et al. Reference Coppedge, Gerring and Knutsen2022). V-Dem includes information on the proportion of national legislators in the lower (or unicameral) chamber who are women, which allows us to assess whether V-Dem’s estimates of women’s representation match the estimates produced by our dataset.
Figure 1 plots women’s representation from V-Dem against country-level proportions from the GLD. The 45-degree line represents a perfect correspondence between the two datasets. As the figure illustrates, the data from the two sources are nearly identical.Footnote 8

Figure 1. Shares of women legislators in the GLD and V-Dem.
Note: Bahamas, Belize, Fiji, and Kosovo are omitted because of missing data in the V-Dem.
For other legislator characteristics available in the GLD, we could only find benchmarks for validating our data in datasets that covered subsets of countries included in the GLD. However, the results consistently validated the data we collected. For example, we compared our data on legislators’ ages to data from the Comparative Legislators Database (CLD) (Göbel and Munzert Reference Göbel and Munzert2022, 1,398), an impressive recent dataset that uses open sources such as Wikipedia and Wikidata to collect legislator-level data for multiple legislative terms in fifteen affluent democracies. In Fig. 2, we compare the average age of legislators calculated using our GLD dataset and the CLD. As with our gender data, our age data are well validated by this simple comparison.

Figure 2. Legislator age in the GLD and CLD.
We cannot validate our age data for the vast majority of the countries we include in the GLD because there are no other datasets available that allow this. Moreover, there are no reliable benchmarks that allow us to carry out similar validation exercises for other important variables, such as occupation and education. In the absence of direct comparisons against existing benchmarks, we opted to carry out a few face validity tests. In most countries, it seems reasonable to expect national legislators to be relatively old and to have fairly high levels of formal education. We would also expect, based on studies of a subset of democracies (see, for example, Carnes Reference Carnes2013; Carnes and Lupu Reference Carnes and Lupu2015), that few legislators will come from working-class economic backgrounds. Figure 3 shows that the distributions of these traits in the GLD are consistent with these reasonable priors. Although we lack concrete benchmarks, the data in the GLD seem valid on their face.

Figure 3. Distributions of legislator traits in the GLD.
Note: Age is calculated at the time of election. Higher education includes levels beyond primary and secondary education (Bachelors, Masters, PhD, LLB, LLM, JD, MD, and short-cycle tertiary). Data on educational attainments for legislators is unavailable for Côte d’Ivoire.
Finally, to assess the contribution of the GLD, we compare it to the most comprehensive individual-level legislator dataset previously assembled, the Global Leadership Project (GLP) (Gerring et al. Reference Gerring, Oncel and Morrison2019; Gerring and Oncel Reference Gerring and Oncel2020). The GLP includes legislator-level dataFootnote 9 gathered from expert surveys fielded in two waves (2010–2013 and 2017–2018). Because our dataset draws on authoritative sources like parliamentary websites, we expect the GLD to offer a more accurate count of legislators for the ninety-seven electoral democracies it includes.
Figure 4 plots the number of lower-chamber legislators in the GLP and the GLD for forty-two countries included in both. Countries for which the two datasets have identical numbers of legislators appear on the 45-degree line; those for which the GLD includes more legislators are above the line. As the figure illustrates, the GLD includes more legislators than the GLP in all but two countries, and the differences are substantial in many cases.Footnote 10

Figure 4. Numbers of legislators in the GLD and GLP.
Together, these validity checks and comparisons underscore the accuracy and value of the GLD’s data. The GLD is the most comprehensive dataset of its kind, offering the most reliable data available to date with the broadest cross-national coverage.
Applications: Re-election, Campaign Finance, and the Rule of Law
The GLD offers comprehensive and reliable data that can be used to answer numerous important research questions about representation and policymaking. The dataset can be aggregated to the party or country level, depending on the appropriate unit of analysis. The GLD can be used to answer questions about the causes or consequences of the descriptive representation of men and women, more- or less-educated representatives, older and younger, and individuals from different occupational backgrounds. It can be used to provide control variables in studies for which the representation of social groups might be potential confounds, and it can be used to answer simple descriptive questions about which scholars have previously only been able to speculate.
To illustrate how the dataset might be used, we provide three examples. These illustrate how the GLD can be applied to research questions that could not previously be studied on electoral democracies globally.
Re-election Rates
We first ask whether re-election rates are higher in countries where lawmakers have different personal characteristics. Many scholars equate educational attainment with skill or ability (for example, Besley and Reynal-Querol Reference Besley and Reynal-Querol2011; Besley, Montalvo, and Reynal-Querol Reference Besley, Montalvo and Reynal-Querol2011; Bovens and Wille Reference Bovens and Wille2017; Hallerberg and Wehner Reference Hallerberg and Wehner2013), an argument that suggests that countries with more educated lawmakers should experience less legislative turnover (but see Carnes and Lupu Reference Carnes and Lupu2016). The obstacles women face as legislators may also make it harder for them to run for re-election than it is for male legislators (for example, Brollo and Troiano Reference Brollo and Troiano2016). Similarly, one reason there are so few working-class members of national legislatures may be that they find it more difficult to get re-elected.
These are all hypotheses that can be tested by combining the GLD with information about re-election. To do so, we used the Re-election in Democracies Around the World dataset (REDRAW) (Golden and Nazrullaeva Reference Golden and Nazrullaeva2024) to determine whether each legislator in the GLD held office in the immediately preceding term in sixty-seven countries.Footnote 11 Using these data on incumbency, we can ask whether education, gender, and occupational background are associated with re-election rates for the largest sample of democracies ever studied on these questions.
The top panel in Fig. 5 shows the country-level relationship between the average educational attainment of legislators and average re-election rates. We find effectively no relationship, offering little corroboration for the idea that legislators with more formal education are more skilled at getting re-elected.

Figure 5. Re-election rates by years of education, gender, and occupational background.
Note: The share of working-class legislators is zero for six countries that are dropped from the figure: Albania, Botswana, Cyprus, Estonia, Guatemala, and Mongolia.
By contrast, the middle panel in Fig. 5 shows clear evidence of differences between men and women. That panel plots the average re-election rate for men (vertical axis) against the average re-election rate for women (horizontal axis); in countries above the 45-degree line, men are re-elected at higher rates than women. It is easy to see that in most countries, men are re-elected at higher rates than women. But the phenomenon is not universal: in twenty-two countries, women are re-elected more often than men.
Finally, the bottom panel in Fig. 5 plots re-election rates among legislators from working-class occupations against re-election rates among legislators who did not hold a working-class job before being elected to public office.Footnote 12 The data offer no support for the hypothesis that lower re-election rates explain the shortage of working-class politicians.Footnote 13 Although re-election rates for working-class incumbents are far more varied than for their non-working-class counterparts, countries are about as likely to be above the 45-degree line (non-working-class re-elected more often than working-class) as they are to be below it (working-class re-elected more often than non- working-class).
These three comparisons suggest that, among underrepresented groups, women may face unique hurdles in securing re-election. We find no consistent cross-national evidence that lawmakers from working-class jobs or lawmakers with less formal education fare worse in future races once they gain initial entry into a national parliament. But in most countries, women who make it into the national legislature still face disadvantages when they seek re-election.Footnote 14
These are of course only correlations that cannot account for potential confounding variables, but they offer important descriptive evidence and they open up new causal questions for further investigation. Why do women experience barriers to re-election that do not appear to confront working-class members of legislatures? What distinguishes countries where women achieve higher re-election rates than men? The GLD allows scholars to study important research questions on representation, and its findings open new avenues for future research.
Campaign Finance and Working-Class Representation
Another potential reason so few working-class people hold national public office could be that, in many countries, they have to raise their own campaign funds. Following Carnes and Lupu (Reference Carnes, Lupu, Lupu and Pontusson2024), which limits its analysis to OECD countries, we might expect that the share of working-class legislators would be higher in countries where public financing is available for political campaigns since this reduces barriers to entry for financially less well-off candidates.
Figure 6 compares the share of working-class representatives in the GLD to the V-Dem measure of public financing regulations, using the same election year. V-Dem measures whether there is ‘significant public financing available for parties’ and/or candidates’ campaigns for national office’ (Coppedge et al. Reference Coppedge, Gerring and Knutsen2022, 63). The measure varies from low to high, where low values mean no public financing and high values mean ‘public financing funds a significant share of expenditures by all, or nearly all parties’ [ibid.]. We find only a weak positive relationship between campaign finance regulations and working-class representation. Although this analysis is simply correlational and does not take into consideration party nomination practices, it suggests that campaign financing may not be much help in explaining why so few working-class people run for elected office.

Figure 6. Campaign finance and working-class representation.
Note: Data on Kosovo, Bahamas and Belize are omitted because of missing data in the V-Dem.
Lawyer Legislators and the Rule of Law
Scholars also regularly study the role of lawyers in legislatures around the world. Early scholarship suggested that lawyers may have advantages in getting into politics in places with more robust legal systems and rule of law (for example, Hain and Piereson Reference Hain and Piereson1975). With the GLD, we can test this with more comprehensive data than ever before.
Figure 7 shows the relationship between the share of legal professionals in each national legislature and the V-Dem measure of the rule of law. V-Dem measures the rule of law as the ‘extent to which laws [are] transparently, independently, predictably, impartially, and equally enforced, and to what extent […] the actions of government officials comply with the law’ (Coppedge et al. Reference Coppedge, Gerring and Knutsen2022, 303), where higher values signify a stronger rule of law in a country.

Figure 7. Lawyers in the legislature and rule of law.
Note: Bahamas, Belize, and Kosovo are omitted because of missing data in the V-Dem.
The data in Fig. 7 offers only modest evidence of a relationship. In countries around the world, legal professionals comprise anywhere from zero to 30 per cent of national legislators. Even countries with a very weak rule of law have many lawyers in their legislature. This descriptive exercise lends support to more recent scholarship that questions earlier ideas about the political advantages of legal professionals (for example, Bonica Reference Bonica2020). And it raises interesting questions about the strength of the professional identities of lawyers around the world.
Conclusions
As these simple applications illustrate, the GLD offers comprehensive, reliable data that can facilitate new cross-national research on the personal backgrounds of politicians. The dataset has numerous potential applications in the study of legislators, political parties, countries, and political representation. With this dataset, researchers can investigate questions about the causes and consequences of the numerical or descriptive representation of social groups defined by gender identity, education level, age, or past occupation. Because the GLD provides individual-level data, these questions can be examined at the individual, party, or country level. We also hope that the documentation provided with the dataset will make it easy for researchers to replicate our methods and collect future waves of biographical data on national legislators in the world’s democracies.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S000712342400053X.
Data availability statement
Replication data for this article are available at https://doi.org/10.7910/DVN/KGYJFJ.
The GLD dataset that is presented in this article is available at https://doi.org/10.7910/DVN/U1ZNVT.
Acknowledgements
Preliminary versions of material included in this article were presented at the 2022 annual meetings of the American Political Science Association, Montreal, 15–18 September; Collegio Carlo Alberto, Turin, 28 October 2021; and the University of Konstanz, 25 April 2022.
Author contributions
Author ordering is alphabetical.
Financial support
This article was written with support from the Academic Senate of the University of California at Los Angeles. The dataset presented in this article received funding from the Academic Senate of the University of California at Los Angeles, the European University Institute, the Sanford School of Public Policy at Duke University, Vanderbilt University, and the Vanderbilt University Office of Equity, Diversity, and Inclusion Seed Grant. The article authors also thank the Department of Government at the London School of Economics and Political Science for financial support secured by Stephane Wolton, who was involved at an early stage of the project.
Competing interests
None.
Ethical standards
The dataset presented in this article was assembled using information already in the public domain. Institutional Review Board clearance was therefore not sought.