Article contents
Mass Digitization of Chinese Court Decisions
How to Use Text as Data in the Field of Chinese Law
Published online by Cambridge University Press: 21 October 2022
Abstract
Since 2014, Chinese courts have placed tens of millions of court judgments online. We analyze the promise and pitfalls of using this new data source, highlighting takeaways for readers facing similar issues using other collections of legal texts. Drawing on 1,058,986 documents from Henan Province, we identify problems with missing data and call on scholars to treat variation in court disclosure rates as an urgent research question. We also outline strategies for learning from a corpus that is vast and incomplete. Using a topic model of administrative litigation in Henan, we complicate conventional wisdom that administrative lawsuits are an extension of contentious politics that give Chinese citizens an opportunity to challenge the state. Instead, we find a high prevalence of administrative cases that reflect an underlying dispute between two private parties, suggesting that administrative lawsuits are often an attempt to enlist help from the state in resolving an underlying civil dispute.
- Type
- Articles
- Information
- Copyright
- © 2020 by the Law and Courts Organized Section of the American Political Science Association. All rights reserved.
Footnotes
We are grateful to a large number of commentators in both China and the United States whose feedback helped improve this article and to the many research assistants at Berkeley, Columbia, and the University of California, San Diego, who worked on various stages of this project. Particular thanks to Kevin Coakley, Subhasis Dasgupta, Amarnath Gupta, Haoshen Hong, and Kai Lin at the San Diego Supercomputer and Xiaohan Wu at Columbia Law for their help with the data. This work was partially funded by the National Science Foundation RIDIR program, award 1738411.
References
- 48
- Cited by