Clause Analysis: Using Syntactic Information to Automatically Extract Source, Subject, and Predicate from Texts with an Application to the 2008–2009 Gaza War

Wouter van Atteveldt; Tamir Sheafer; Shaul R. Shenhav; Yair Fogel-Dror

doi:10.1017/pan.2016.12

Clause Analysis: Using Syntactic Information to Automatically Extract Source, Subject, and Predicate from Texts with an Application to the 2008–2009 Gaza War

Published online by Cambridge University Press: 01 March 2017

Shaul R. Shenhav and

Wouter van Atteveldt*: Affiliation:
Department of Communication Science, VU University Amsterdam, The Netherlands. Email: [email protected]
Tamir Sheafer: Affiliation:
Department of Political Science and Department of Communication, The Hebrew University of Jerusalem, Israel
Shaul R. Shenhav: Affiliation:
Department of Political Science, The Hebrew University of Jerusalem, Israel
Yair Fogel-Dror: Affiliation:
Department of Political Science, The Hebrew University of Jerusalem, Israel
*: *Email: [email protected]

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

This article presents a new method and open source R package that uses syntactic information to automatically extract source–subject–predicate clauses. This improves on frequency-based text analysis methods by dividing text into predicates with an identified subject and optional source, extracting the statements and actions of (political) actors as mentioned in the text. The content of these predicates can be analyzed using existing frequency-based methods, allowing for the analysis of actions, issue positions and framing by different actors within a single text. We show that a small set of syntactic patterns can extract clauses and identify quotes with good accuracy, significantly outperforming a baseline system based on word order. Taking the 2008–2009 Gaza war as an example, we further show how corpus comparison and semantic network analysis applied to the results of the clause analysis can show differences in citation and framing patterns between U.S. and English-language Chinese coverage of this war.

Type: Articles
Information: Political Analysis , Volume 25 , Issue 2 , April 2017 , pp. 207 - 222

DOI: https://doi.org/10.1017/pan.2016.12 [Opens in a new window]
Copyright: Copyright © The Author(s) 2017. Published by Cambridge University Press on behalf of the Society for Political Methodology.

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Authors’ note: The research was partly supported by the Israel Science Foundation and the Ministry of Science, Technology, and Space, Israel. The data and R scripts for replicating the validation and substantive analyses are published in the Harvard Dataverse (Van Atteveldt, Sheafer, Shenhav, and Fogel-Dror 2016).

Contributing Editor: R. Michael Alvarez

References

Baker, C., Fillmore, C., and Cronin, B.. 2003. The structure of the framenet database. International Journal of Lexicography 16(3):281–296.Google Scholar

Blei, D. M., Ng, A. Y., and Jordan, M. I.. 2003. Latent dirichlet allocation. The Journal of Machine Learning Research 3:993–1022.Google Scholar

Carreras, X., and Màrquez, L.. 2005. Introduction to the conll-2005 shared task: semantic role labeling. In Proceedings of the ninth conference on computational natural language learning . Stroudsburg, PA: Association for Computational Linguistics, pp. 152–164.Google Scholar

Chen, D., Schneider, N., Das, D., and Smith, N. A.. 2010. SEMAFOR: frame argument resolution with log-linear models. In Proceedings of the 5th international workshop on semantic evaluation . Stroudsburg, PA: Association for Computational Linguistics, pp. 264–267.Google Scholar

Collingwood, L., and Wilkerson, J.. 2012. Tradeoffs in accuracy and efficiency in supervised learning methods. Journal of Information Technology & Politics 9(3):298–318.Google Scholar

De Marneffe, M., MacCartney, B., and Manning, C.. 2006. Generating typed dependency parses from phrase structure parses. In Proceedings of LREC , vol. 6, pp. 449–454.Google Scholar

D’Orazio, V., Landis, S. T., Palmer, G., and Schrodt, P.. 2014. Separating the wheat from the chaff: applications of automated document classification using support vector machines. Political Analysis 22(2):224–242.Google Scholar

Entman, R. M. 2008. Theorizing mediated public diplomacy: The U.S. case. International Journal of Press/Politics 13:87–102.Google Scholar

Fellbaum, C., ed. 1998. WordNet: an electronic lexical database. Cambridge, MA: MIT Press.Google Scholar

Fogel-Dror, Y., Sheafer, T., Shenhav, S. R., and Van Atteveldt, W.. 2015. Real-time sentiment analysis in the context of a political conflict. In Annual meeting of the American political science association San-Francisco, CA .Google Scholar

Grimmer, J. 2010. A bayesian hierarchical topic model for political texts: measuring expressed agendas in senate press releases. Political Analysis 18(1):1–35.Google Scholar

Grimmer, J., and Stewart, B. M.. 2013. Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Political Analysis 21:267–297.Google Scholar

Grossman, D. A., and Frieder, O.. 2012. Information retrieval: algorithms and heuristics , vol. 15, Springer Science & Business Media.Google Scholar

Hillard, D., Purpura, S., and Wilkerson, J.. 2008. Computer assisted topic classification for mixed methods social science research. Journal of Information Technology and Politics 4(4):31–64.Google Scholar

Kellstedt, P. M. 2003. The mass media and the dynamics of American racial attitudes . New York: Cambridge University Press.Google Scholar

Laver, M., Benoit, K., and Garry, J.. 2003. Extracting policy positions from political texts using words as data. American Political Science Review 97(2):311–331.Google Scholar

Lowe, W., and Benoit, K.. 2013. Validating estimates of latent traits from textual data using human judgment as a benchmark. Political Analysis 21(3):298–313.Google Scholar

Miller, G. 1995. WordNet: a lexical database for English . New York: ACM Press.Google Scholar

Monroe, B. L., Colaresi, M. P., and Quinn, K. M.. 2008. Fightin’words: lexical feature selection and evaluation for identifying the content of political conflict. Political Analysis 16(4):372–403.Google Scholar

Quinn, K. M., Monroe, B. L., Colaresi, M., Crespin, M. H., and Radev, D. R.. 2010. How to analyze political attention with minimal assumptions and costs. American Journal of Political Science 54(1):209–228.Google Scholar

Roberts, M. E. 2015. Introduction to the virtual issue: recent innovations in text analysis for social science. Political Analysis 23:254–277.Google Scholar

Ruigrok, N., and Van Atteveldt, W.. 2007. Global angling with a local angle: How U.S., British, and Dutch newspapers frame global and local terrorist attacks. The Harvard International Journal of Press/Politics 12:68–90.Google Scholar

Schrodt, P. A.2014. TABARI: textual analysis by augmented replacement instructions, version 0.8.4b3; http://eventdata.parusanalytics.com/software.dir/tabari.html.Google Scholar

Schrodt, P. A., and Gerner, D. J.. 1994. Validity assessment of a machine-coded event data set for the Middle East, 1982–1992. American Journal of Political Science 38(3):825–854.Google Scholar

Schrodt, P. A., and Gerner, D. J.. 2000. Cluster-based early warning indicators for political change in the contemporary levant. American Political Science Review 94(4):803–818.Google Scholar

Schrodt, P. A., Gerner, D. J., and Yilmaz, O.. 2005. Using event data to monitor contemporary conflict in the Israel-Palestine dyad. International Studies Perspectives 6(2):235–251.Google Scholar

Sebastiani, F. 2002. Machine learning in automated text categorization. ACM Computing Surveys 34(1):1–47.Google Scholar

Sheafer, T., and Shenhav, S. R.. 2010. Mediated public diplomacy in a new era of warfare. The Communication Review 12:272–283.Google Scholar

Sheafer, T., Shenhav, S. R., Takens, J., and Van Atteveldt, W.. 2014. Relative political and value proximity in mediated public diplomacy: the effect of state-level homophily on international frame building. Political Communication 31(1):149–167.Google Scholar

Slapin, J. B., and Proksch, S.-O.. 2008. A scaling model for estimating time-series party positions from texts. American Journal of Political Science 52(3):705–722.Google Scholar

Stone, P. J., Dunphy, D. C., Smith, M. S., and Ogilvie, D. M. et al. . 1966. The General Inquirer: a computer approach to content analysis . Cambridge, MA: MIT Press.Google Scholar

Van Atteveldt, W. 2008. Semantic network analysis: techniques for extracting, representing, and querying media content (dissertation) . Charleston, SC: BookSurge.Google Scholar

Van Atteveldt, W.2013. News media: platform or power broker? A study of political quotes in newspaper content using syntactic analysis. Presented at the New Directions in Analyzing Text as Data workshop, LSE, 27–28 September.Google Scholar

Van Atteveldt, W., Kleinnijenhuis, J., and Ruigrok, N.. 2008. Parsing, semantic networks, and political authority: using syntactic analysis to extract semantic relations from Dutch newspaper articles. Political Analysis 16(4):428–446.Google Scholar

Van Atteveldt, W., Sheafer, T., Shenhav, S. R., and Fogel-Dror, Y.. 2016. Replication data for: clause analysis: using syntactic information to automatically extract source, subject, and predicate from texts with an application to the 2008–2009 Gaza War. doi:107910/DVN/DZZXAD, Harvard Dataverse, V1 [UNF:6:IdSlgh3RYlPHO1Hq0pCahQ==].Google Scholar

Young, L., and Soroka, S.. 2012. Affective news: the automated coding of sentiment in political texts. Political Communication 29(2):205–231.Google Scholar

van Atteveldt supplementary material

File 166.3 KB

Article contents

Clause Analysis: Using Syntactic Information to Automatically Extract Source, Subject, and Predicate from Texts with an Application to the 2008–2009 Gaza War

Abstract

Access options

Footnotes

References

van Atteveldt supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests