Published online by Cambridge University Press: 23 May 2011
Legislative speech records from the 101st to 108th Congresses of the US Senate are analysed to study political ideologies. A widely-used text classification algorithm – Support Vector Machines (SVM) – allows the extraction of terms that are most indicative of conservative and liberal positions in legislative speeches and the prediction of senators’ ideological positions, with a 92 per cent level of accuracy. Feature analysis identifies the terms associated with conservative and liberal ideologies. The results demonstrate that cultural references appear more important than economic references in distinguishing conservative from liberal congressional speeches, calling into question the common economic interpretation of ideological differences in the US Congress.
1 Poole, Keith T., ‘Changing Minds? Not in Congress’, Public Choice, 131 (2007), 435–451CrossRefGoogle Scholar.
2 Converse, Philip E., ‘The Nature of Belief Systems in Mass Publics’, in David E. Apter, ed., Ideology and Discontent (New York: The Free Press, 1964), pp. 206–261Google Scholar.
3 Converse, , ‘The Nature of Belief Systems and Mass Publics’, p. 207Google Scholar.
4 Poole, Keith T. and Rosenthal, Howard, ‘Patterns of Congressional Voting’, American Journal of Political Science, 35 (1991), 228–278CrossRefGoogle Scholar; Poole, Keith T. and Rosenthal, Howard, Congress: A Political-Economic History of Roll Call Voting (New York: Oxford University Press, 1997)Google Scholar; McCarty, Nolan, Poole, Keith T. and Rosenthal, Howard, Income Redistribution and the Realignment of American Politics (Washington, D.C.: American Enterprise Institute, 1997)Google Scholar; McCarty, Nolan, Poole, Keith T. and Rosenthal, Howard, Polarized America: The Dance of Ideology and Unequal Riches (Boston, Mass.: MIT Press, 2006)Google Scholar; Poole, Keith T. and Rosenthal, Howard, Ideology and Congress (New Brunswick, N.J.: Transaction Publishers, 2007)Google Scholar.
5 Poole and Rosenthal, Congress.
6 Poole, Keith T., Spatial Models of Parliamentary Voting (New York: Cambridge University Press, 2005)CrossRefGoogle Scholar.
7 Initially, this finding met with widespread disbelief. See Poole and Rosenthal, Congress, p. 8. However, the low-dimensionality of legislative voting has been confirmed by other scholars using different estimation methodologies, such as Bayesian procedures ( Clinton, Joshua, Jackman, Simon and Rivers, Doug, ‘The Statistical Analysis of Roll Call Data’, American Political Science Review, 98 (2004), 355–370CrossRefGoogle Scholar) or factor analysis ( Heckman, James J. and Snyder, James M. Jr, ‘Linear Probability Models of the Demand for Attribution with an Empirical Application to Estimating the Preferences of Legislators’, RAND Journal of Economics, 28 (1997), S142–S189CrossRefGoogle Scholar) for estimating ideal points.
8 Institutional features such as gate-keeping powers of committees ( Shepsle, Kenneth A. and Weingast, Barry R., ‘Structure-induced Equilibrium and Legislative Choice’, Public Choice, 37 (1981), 503–519CrossRefGoogle Scholar), pre-floor legislative activities (such as co-sponsorship), strategic voting ( Talbert, Jeffery C. and Potoski, Matthew, ‘Setting the Legislative Agenda: The Dimensional Structure of Bill Cosponsoring and Floor Voting’, Journal of Politics, 64 (2002), 864–891CrossRefGoogle Scholar) or institutional constraints such as the presidential veto ( Roberts, Jason M., ‘The Statistical Analysis of Roll Call Data: A Cautionary Tale’, Legislative Studies Quarterly, 22 (2007), 341–360CrossRefGoogle Scholar; Clinton, Joshua D., ‘Lawmaking and Roll Calls’, Journal of Politics, 69 (2007), 457–469CrossRefGoogle Scholar) can all affect the measurement of ideal points and reduce the dimensionality of legislative voting in Congress. It is also possible that exogenous factors, such as electoral incentives, could help explain why parties aim to present a coherent legislative agenda, and avoid intra-party voting divisions. Indeed, Snyder and Ting ( Snyder, James M. and Ting, Michael M., ‘An Informational Rationale for Political Parties’, American Journal of Political Science, 46 (2002), 90–110CrossRefGoogle Scholar; Snyder, James M. and Ting, Michael M., ‘Party Labels, Roll Calls, and Elections’, Political Analysis, 11 (2003), 419–444CrossRefGoogle Scholar) and Woon and Pope ( Woon, Jonathan and Pope, Jeremy C., ‘Made in Congress? Testing the Electoral Implications of Party Ideological Brand Names’, Journal of Politics, 70 (2008), 823–836CrossRefGoogle Scholar) argue that parties can use their aggregate roll-call record to produce a coherent ideological brand name in order to communicate with the electorate. In this context, the observed unidimensionality in legislative voting would be facilitated by electoral incentives, rather than by institutional rules or agenda control.
9 On the Westminster style parliamentary systems, see Spirling and McLean ( Spirling, Arthur and McLean, Iain, ‘UK OC OK? Interpreting Optimal Classification Scores for the U.K. House of Commons’, Political Analysis, 15 (2006), 85–86CrossRefGoogle Scholar). On the US Congress case, see Clinton, ‘Lawmaking and Roll Calls’, and Roberts, ‘The Statistical Analysis of Roll Call Data’.
10 See, for example, the NPAT candidate survey of Ansolabehere, Stephen, Snyder, James M. Jr and Stewart, Charles III, ‘The Effects of Party and Preferences on Congressional Roll-Call Voting’, Legislative Studies Quarterly, 26 (2001), 533–572CrossRefGoogle Scholar, which looks at the correlation between first factor nominate and first factor NPAT scores; or the Poole and Rosenthal study of nominate scores and interest group ratings (Poole and Rosenthal, Congress; Poole and Rosenthal, Ideology and Congress).
11 One such example is Schonhardt-Bailey, Cheryl, ‘The Congressional Debate on Partial-Birth Abortion: Constitutional Gravitas and Moral Passion’, British Journal of Political Science, 38 (2008), 383–410CrossRefGoogle Scholar. In her study of the US Senate debates on partial-birth abortion, Schonhardt-Bailey identifies two dimensions of conflict, where the first dimension represents an emotive conflict over the abortion procedure, while the second dimension is related to the constitutionality of the bill. Schonhardt-Bailey argues that legislative voting correlates with this second dimension.
12 Budge, Ian, Klingemann, Hans-Dieter, Volkens, Andrea, Bara, Judith and Tanenbaum, Eric, Mapping Policy Preferences: Estimates for Parties, Electors, and Governments 1945–1998 (Oxford: Oxford University Press, 2001)Google Scholar; Baumgartner, Frank R. and Jones, Bryan D., Agendas and Instability in American Politics (Chicago: University of Chicago Press, 1993)Google Scholar; Baumgartner, Frank R. and Jones, Bryan D., eds, Policy Dynamics (Chicago: University of Chicago Press, 2002)Google Scholar; Baumgartner, Frank R. and Jones, Bryan D., The Politics of Attention: How Government Prioritizes Problems (Chicago: University of Chicago Press, 2005)Google Scholar.
13 For examples, see Laver, Michael and Benoit, Kenneth, ‘Locating TDs in Policy Spaces: Wordscoring Dáil Speeches’, Irish Political Studies, 17 (2002), 59–73CrossRefGoogle Scholar; Laver, Michael, Benoit, Kenneth and Garry, John, ‘Extracting Policy Positions from Political Texts Using Words as Data’, American Political Science Review, 97 (2003), 311–337CrossRefGoogle Scholar; Benoit, Kenneth and Laver, Michael, ‘Estimating Irish Party Positions Using Computer Wordscoring: The 2002 Elections’, Irish Political Studies, 18 (2003), 97–107CrossRefGoogle Scholar; Benoit, Kenneth and Laver, Michael, ‘Mapping the Irish Policy Space: Voter and Party Spaces in Preferential Elections’, Economic and Social Review, 36 (2005), 83–108Google Scholar; Monroe, Burt L. and Maeda, Ko, ‘Rhetorical Ideal Point Estimation: Mapping Legislative Speech’ (presented at the Society for Political Methodology, Palo Alto: Stanford University, 2004)Google Scholar; Simon, Adam F. and Xenos, Michael, ‘Dimensional Reduction of Word-frequency Data as a Substitute for Intersubjective Content Analysis’, Political Analysis, 12 (2004), 63–75CrossRefGoogle Scholar; Slapin, Jonathan B. and Proksch, Sven O., ‘A. Scaling Model for Estimating Time-Series Party Positions from Texts’, American Journal of Political Science, 52 (2008), 705–722CrossRefGoogle Scholar; Quinn, Kevin M., Monroe, Burt L., Colaresi, Michael, Crespin, Michael H. and Radev, Dragomir R., ‘How to Analyze Political Attention with Minimal Assumptions and Costs’, American Journal of Political Science, 54 (2010), 209–228CrossRefGoogle Scholar. For a recent review, see Monroe, Burt and Schrodt, Philipp A., ‘Introduction to the Special Issue: The Analysis of Political Text’, Political Analysis, 16 (2008), 351–355CrossRefGoogle Scholar; and also Cousins, Ken and McIntosh, Wayne, ‘More than Typewriters, More than Adding Machines: Integrating Information Technology into Political Research’, Quality and Quantity, 39 (2005), 591–614CrossRefGoogle Scholar; and Yano, Tae, Cohen, William W. and Smith, Noah A., ‘Predicting Response to Political Blog Posts with Topic Models’, Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics conference (NAACL) (2009), 477–485Google Scholar.
14 Examples include Laver, Benoit and Garry, ‘Extracting Policy Positions from Political Texts Using Words as Data’. See also Benoit and Laver, ‘Estimating Irish Party Positions Using Computer Wordscoring’; Benoit and Laver, ‘Mapping the Irish Policy Space’; Laver and Benoit, ‘Locating TDs in Policy Spaces; Benoit, Kenneth, Laver, Michael, Arnold, Christine, Pennings, Paul and Hosli, Madeleine O., ‘Measuring National Delegate Positions at the Convention on the Future of Europe Using Computerized Wordscoring’, European Union Politics, 6 (2005), 291–313CrossRefGoogle Scholar. For a critical view, see Budge, Ian and Pennings, Paul, ‘Do They work? Validating Computerized Word Frequency Estimates against Policy Series’, Electoral Studies, 26 (2007), 121–129CrossRefGoogle Scholar.
15 Monroe and Maeda, ‘Rhetorical Ideal Point Estimation’; Slapin and Proksch, ‘A Scaling Model for Estimating Time-Series Party Positions from Texts’.
16 Purpura, Stephen and Hillard, Dustin, ‘Automated Classification of Congressional Legislation’, Proceedings of the 2006 International Conference on Digital Government Research (2006), 219–225CrossRefGoogle Scholar, retrieved 28 May 2007, from the ACM Digital Library; Pang, Bo, Lee, Lillian and Vaithyanathan, Shivakumar, ‘Thumbs up? Sentiment Classification Using Machine Learning Techniques’, Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, (2002), 79–86CrossRefGoogle Scholar, retrieved 28 May 2007, from the ACM Digital Library; Thomas, Matt, Pang, Bo and Lee, Lillian, ‘Get out the Vote: Determining Support or Opposition from Congressional Floor-debate Transcripts’, Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (2006), 327–335CrossRefGoogle Scholar, retrieved from the ACL Digital Archive, predicted speakers’ opinions about a specific bill (support or opposition) based on their speeches. Their classifier was trained on 2,740 speech segments in 38 bill debates and achieved an accuracy of 66 per cent in predicting the opinions expressed in 860 speech segments from ten different legislative debates.
17 We used the Poole and Rosenthal dw-nominatescores available at http://voteview.com/dwnomin.htm.
18 We will discuss differences between the House and the Senate below. We will also suggest how the approach can be utilized when studying other legislatures.
19 The dw-nominate scores for the same senators can be different across congresses. As a result, when we prepare the senatorial speeches as training and testing documents (each document is called an ‘example’ in machine learning terms), a senator could be assigned to the extreme category in one congress but moved to the moderate category in another. Therefore, we treat the same senators in different congresses as different training/testing examples.
20 Forty-five of these fifty ‘extreme’ senators had already served in the 107th Congress.
21 This issue was investigated by Poole (‘Changing Minds? Not in Congress’) in the context of voting behaviour. Poole found strong support for individual ideological consistency in members of Congress over time.
22 Ninety-one senators in the 108th Congress served in previous congresses. Forty-four of the fifty extreme senators in the 108th Congress were rated as extreme in previous congresses.
23 The performance of classification algorithms is tested using common benchmark datasets. The Reuters-21578 news collection, the OHSUMED Medline abstract collection, and the 20 Usenet newsgroups collection are the most widely used benchmark datasets. The Reuters-21578 collection is available at http://kdd.ics.uci.edu/databases/20newsgroups/20newsgroups.html. The OHSUMED collection is available at http://trec.nist.gov/data/t9_filtering.html. The 20 newsgroups collection is available at http://kdd.ics.uci.edu/databases/20newsgroups.html.
24 Dumais, Susan, Platt, John, Heckerman, David and Sahami, Mehran, ‘Inductive Learning Algorithms and Representations for Text Categorization’, Proceedings of the 7th International Conference on Information and Knowledge Management (1998), 48–155Google Scholar, retrieved 28 May 2007, from the ACM Digital Library; Guyon, Isabelle, Weston, Jason, Barnhilland, Stephen, Vapnik, Vladimir, ‘Gene Selection for Cancer Classification Using Support Vector Machines’, Machine Learning, 46 (2002), 389–422CrossRefGoogle Scholar; Forman, George, ‘An Extensive Empirical Study of Feature Selection Metrics for Text Categorization’, Journal of Machine Learning Research, 3 (2003), 1289–1305Google Scholar; Joachims, Thorsten, ‘Text Categorization with Support Vector Machines: Learning with Many Relevant Features’, 10th European Conference on Machine Learning, Vol. 1398 of Lecture Notes in Computer Science (Berlin: Springer Verlag, 1998), pp. 137–142Google Scholar; Mladenic, Dunja, Brank, Janez, Grobelnik, Marko and Milic-Frayling, Natasa, ‘Feature Selection Using Linear Classifier Weights: Interaction with Classification Models’, Proceedings of the 27nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '04), (Sheffield: 25–29 July 2004), pp. 234–41Google Scholar; Yang, Yiming and Liu, Xin, ‘A Re-evaluation of Text Categorization Methods’, Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (1999), 42–49Google Scholar, retrieved 28 May 2007, from the ACM Digital Library; Sebastiani, Fabrizio, ‘Machine Learning in Automated Text Categorization’, ACM Computing Surveys, 34 (2002), 1–47CrossRefGoogle Scholar. We also compared our SVM algorithm to naïve Bayes, another popular classification method. Our experiment results show that SVM is slightly superior to naïve Bayes for ideological position classification.
25 Pang, Lee and Vaithyanathan, ‘Thumbs up?’
26 Details on the way in which these vectors were derived from the documents are discussed in the next section.
27 Vapnik, Vladimir, Estimating of Dependences Based on Empirical Data (New York: Springer-Verlag, 1982)Google Scholar; Cortes, Corinna and Vapnik, Vladimir, ‘Support-vector Networks’, Machine Learning, 20 (1995), 273–297CrossRefGoogle Scholar; Vapnik, Vladimir, The Nature of Statistical Learning Theory (New York: Springer-Verlag, 1999)Google Scholar.
28 There are several efficient implementations of the SVM algorithm, such as LIBSVM and SVMlight (Thorsten Joachims, ‘SVMlight: Support Vector Machine (Version 6.01)’, (2004)). We used the SVMlight package with its default setting in this study. See Chang, Chih C. and Lin, Chih J., ‘LIBSVM: A Library for Support Vector Machines’ (2001), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvmGoogle Scholar.
29 Lewis, David D., ‘An Evaluation of Phrasal and Clustered Representations on a Text Categorization Task’, Proceedings of the 15th Annual International Conference on Research and Development of Information Retrieval (1992), pp. 37–50Google Scholar, retrieved 28 May 2007, from the ACM Digital Library; Cohen, William W. and Singer, Yoram, ‘Context-sensitive Learning Methods for Text Categorization’, ACM Transactions on Information Systems, 17 (1999), 141–173CrossRefGoogle Scholar; Scott, Sam and Matwin, Stan, ‘Feature Engineering for Text Classification’, Proceedings of the 16th International Conference on Machine Learning (San Francisco: Morgan Kaufmann, 1999), pp. 379–388Google Scholar; Moschitti, Alessandro and Basili, Roberto, ‘Complex Linguistic Features for Text Classification: A Comprehensive Study’, European Conference on Information Retrieval, Vol. 2997 of Lecture Notes in Computer Science (Berlin: Springer Verlag, 2004), pp. 181–196Google Scholar.
30 Dave, Kushal, Lawrence, Steve and Pennock, David M., ‘Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews’, Proceedings of the 12th International Conference on World Wide Web (2003), 519–522Google Scholar, retrieved 28 May 2007, from the ACM Digital Library; Pang, Lee and Vaithyanathan, ‘Thumbs up?’
31 Finn, Aidan and Kushmerick, Nicholas, ‘Learning to Classify Documents according to Genre’, Journal of American Society of Information Science and Technology, 57 (2006), 1506–1518CrossRefGoogle Scholar. For example, some typical adjectives in movie reviews (like hilarious and boring) are unlikely to occur in restaurant reviews, although some opinion descriptors (like terrific and bad) are universal.
32 Miss was not included because all single female Senators (e.g. Susan Collins and Barbara Mikulski) were saluted as ‘Ms’.
33 Yu, Bei, Diermeier, Daniel and Kaufmann, Stefan‘Classifying Party Affiliation from Political Speech’, Journal of Information Technology & Politics, 5 (2008), 33–48CrossRefGoogle Scholar.
34 Porter, M. F., ‘An Algorithm for Suffix Stripping’, Program, 14 (1980), 130–137CrossRefGoogle Scholar.
35 We used the MorphAdorner tagger to tag the parts of speech. Since the tagger has its own tokenizer, the generated word forms in this case are slightly different from the results of the simple tokenizer.
36 This is a standard approach in classification tasks; see, e.g., Tom Mitchell, Machine Learning (Toronto: McGraw Hill, 1997). An alternative approach consists in setting aside a sizeable portion of the data as a ‘held-out’ set which is ignored during training and only used for testing. This approach is sound for datasets with large numbers of labelled examples. However, for small datasets such as ours, it is problematic since the arbitrary training/test split may accidentally lead to two datasets that are unlikely to have been produced by the same source.
37 The accuracy was even higher (94 per cent) when adjectives were used as feature sets. Since there are only fifty test examples, 2 per cent accuracy improvement corresponds to one more correctly predicted example. Therefore, we do not think the accuracy difference is significant.
38 Note, however, that the out-of-sample set is small due to lack of turnover among members of the Senate.
39 These polarities are arbitrary. See the methodology section for technical details.
40 This is related to the literature on framing. For a recent review, see Druckman, Jamie and Chong, Dennis , ‘Framing Theory’, Annual Review of Political Science, 10 (2007), 103–126Google Scholar.
41 We reproduce in Table 4 the most liberal and conservative words as they appear in our ranking, from first to the twentieth in rank order. However, for the purposes of this discussion, we selected words ranked in the top fifty to illustrate commonality.
42 For example, Senator Colman in the 106th Senate mentioned ‘grievous injury’ before he expressed his objection to this amendment to the partial-birth ban act.
43 To compare the two chambers directly, it is necessary to use a common space score for both the House and the Senate. See, for example, Royce Carroll, Jeff Lewis, James Lo, Nolan McCarty, Keith Poole and Howard Rosenthal, ‘ “Common Space” (Joint House and Senate) dw-nominate Scores with Bootstrapped Standard Errors’ (2009).
44 The kappa coefficient is often used to measure inter-rater agreement in annotation. We followed the kappa computation procedure described at http://faculty.vassar.edu/lowry/kappa.html.
45 Yu, Bei, Kaufmann, Stefan and Diermeier, Daniel, ‘Classifying Party Affiliation from Political Speech’ Journal of Information Technology and Politics, 5 (2008), 33–48CrossRefGoogle Scholar. The lower accuracy is a consequence of a smaller dataset.
46 Yu, Bei, Kaufmann, Stefan and Diermeier, Daniel, ‘Exploring the Characteristics of Opinion Expressions for Political Opinion Classification’, Proceedings of the 9th Annual International Conference on Digital Government Research (dg.o 2008) (Montreal, May 2008), pp. 82–89Google Scholar.
47 We thank an anonymous referee for pointing out this possibility.
48 Høyland, Bjørn and Godbout, Jean-François, ‘Predicting Party Group Affiliation from European Parliament Debates’ (paper presented at the European Consortium for Political Research Meeting of the Standing Group on the European Union (Riga: Latvia, 2008)Google Scholar).
49 Poole, Spatial Models of Parliamentary Voting.
50 Except during the Era of Good Feelings (1817–25) and the period surrounding the Civil War (1853–76); Poole and Rosenthal, ‘Congress’; Poole and Rosenthal, Ideology and Congress.
51 See, for example, Lakoff, George, Moral Politics: How Liberals and Conservatives Think (Chicago: The University of Chicago Press, 2002)CrossRefGoogle Scholar.
52 The large dot in the equation refers to the operation of the inner product of two vectors.
53 Leopold, Edda and Kindermann, Jörg, ‘Text Categorization with Support Vector Machines: How to Represent Texts in Input Space?’ Machine Learning, 46 (2002), 423–444CrossRefGoogle Scholar.
54 The abbreviation sv stands for an arbitrary support vector. In the SVMlight software package, the first support vector (according to its order in the input data) was used to compute b.
55 Joachims, ‘SVMlight’.
56 Chang and Lin, ‘Library for Support Vector Machines’.