Introduction
It has been traditionally espoused that there has been a divide between the empirically oriented US legal scholarship, stemming from a different perception of the role of courts and judges, and between the rest of the world (Hamann Reference Hamann2019, 416). The empirical legal scholarship is occupied with researching whether and to what extent they behave as, for example, political or strategic actors (Kornhauser Reference Kornhauser1992a, Reference Kornhauserb; Posner Reference Posner1993, Reference Posner2010; Epstein and Knight Reference Epstein and Knight1997, Reference Epstein and Knight2000; Sunstein et al. Reference Sunstein, Schkade, Ellman and Sawicki2006; Clark and Lauderdale Reference Clark and Lauderdale2010; Epstein, Landes, and Posner Reference Epstein, Landes and Posner2011; Carrubba et al. Reference Carrubba, Friedman, Martin and Vanberg2012; Lauderdale and Clark Reference Lauderdale and Clark2014; Cameron and Kornhauser Reference Cameron and Kornhauser2017; Clark, Engst, and Staton Reference Clark, Engst and Staton2018; Roussey and Soubeyran Reference Roussey and Soubeyran2018).
In contrast to that, judges have been perceived as “proclaimers of law” and the law handed down by them, especially in European legal systems such as the one at hand – Czechia. Hamann (Reference Hamann2019, 417) even claims that such a view had hindered robust empirical legal research in Europe. The lack of empirical legal research can be partially blamed on the lack of high-quality data, a prerequisite for any quantitative empirical research. At least so the story goes until recently. The interest in empirical legal studies has picked up in the last years across the whole continent, including studies on a plethora of topics in Germany (Wittig Reference Wittig2016; Engst et al. Reference Engst, Gschwend, Schaks, Sternberg and Wittig2017; Coupette and Fleckner Reference Coupette and Fleckner2018; Arnold, Engst, and Gschwend Reference Arnold, Engst and Gschwend2023), Spain and Portugal (Hanretty Reference Hanretty2012), the UK (Hanretty Reference Hanretty2020) or on the EU institutions (Bielen et al. Reference Bielen, Peeters, Marneffe and Vereeck2018; Fjelstul Reference Fjelstul2019, Reference Fjelstul2023; Fjelstul, Gabel, and Carrubba Reference Fjelstul, Gabel and Carrubba2022; Brekke et al. Reference Brekke, Naurin, Šadl and López-Zurita2023b).
Publications of new high-quality publicly accessible data have gone hand in hand with these developments. In recent years, several comprehensive datasets and databases have been released, namely the Iuropa project’s CJEU database (Brekke et al. Reference Brekke, Fjelstul, Hermansen and Naurin2023a), the German Federal courts (Hamann Reference Hamann2019), and the German Federal Constitutional Court (Engst, Hönnige, and Gschwend Reference Engst, Hönnige and GschwendForthcoming) databases (for an overview see Engst and Gschwend Reference Engst, Gschwend, Epstein, Grendstad, Šadl and Weinshall2024). The mushrooming research proves that there is a demand for quality data in Europe as well.
To build on and to continue in these efforts, the article at hand presents a Czech Constitutional Court (“CCC”) database, a comprehensive high-quality multiuser database on the CCC. The CCC database is foundational in that it encompasses plethora of data, on which other researches can base their research efforts on, it possesses the capacity to address various research questions, and it adheres to the tidy data principles. The database includes all decisions of the CCC starting from its foundation in 1993 until the end of 2023. Plenty of metadata is included, such as information on the judge rapporteur, subject matter, or concerned legal acts, a complete text corpus, as well additional background information on judges and clerks.
To the best of my knowledge, the CCC database is one of the first, if not the first, comprehensive databases coming out of the Central and Eastern European (“CEE”) region. The CEE region has gotten to the spotlight of European legal research, among others as a result of various rule of law crises, in which the regional constitutional courts and their interplay with the CJEU have played an important role (Kelemen and Pech Reference Kelemen and Pech2019; Sadurski Reference Sadurski2019; Kelemen Reference Kelemen2020 and many other articles). Despite that, the CEE scholarship has so far produced very little in terms of methodologically rigorous empirical legal research output concerning the role of the judiciary, constitutional courts, or judicial politics. The lack of high-quality data is undoubtedly a piece of this puzzle.
To zero in on Czechia, there have been solitary attempts go gather data in some shape or form in the CCC context (Harašta et al. Reference Harašta, Šavelka, Kasl, Kotková, Loutocký, Míšek, Procházková, Sojka, Horák, Kopeček and Pala2018; Novotná and Harašta Reference Novotná and Harašta2019), mainly thanks to the Institute of Law and Technology based in Brno, as well as isolated attempts to conduct network analysis or research employing natural language processing and alike methods (Chmel Reference Chmel2017; Eliášek, Kól, and Švaňa Reference Eliášek, Kól and Švaňa2020; Harašta et al. Reference Harašta, Smejkalová, Šavelka and Polčák2021; Vartazaryan Reference Vartazaryan2022). Unfortunately, the former group did not always adhere to the principles of high-quality infrastructure, namely the principle of foundationality, espoused by Weinshall and Epstein (Reference Weinshall and Epstein2020, 424), the latter group did not publish data/code at all. Therefore, the effort to put together and to publish a high-quality database on the CCC is more than warranted, especially to enable robust empirical legal scholarship to flourish in the CEE region.
The presented database can serve as a foundation for a wide variety of research inquiries, which I will now briefly discuss and in which I delve deeper in Section 4.1. Among others, the judicial politics of the CCC may be studied (Lax and Cameron Reference Lax and Cameron2007; Lax Reference Lax2011). The research on dissenting behavior and disagreement on the bench is of particular interest as the studies of Epstein, Landes, and Posner (Reference Epstein, Landes and Posner2011) and Wittig (Reference Wittig2016) could be replicated with the data at hand. The inclusion of decisions of texts as well as references to other decisions enables the point-estimation of court decisions (Clark and Lauderdale Reference Clark and Lauderdale2010; Gschwend, Sternberg, and Zittlau Reference Gschwend, Sternberg and Zittlau2016). That in combination with the internal chamber structure of the CCC opens the possibility of an inquiry of how consistent is and under what conditions does the consistency of the CCC caselaw varies across its chambers, following up on the theoretical Fjelstul (Reference Fjelstul2023) study on the caselaw consistency across CJEU chambers. The database enables research on the background of justices, such as the role of gender (Boyd, Epstein, and Martin Reference Boyd, Epstein and Martin2010; Epstein and Knight Reference Epstein and Knight2022), their education, or their clerk team selection (Kromphardt Reference Kromphardt2015; Badas and Stauffer Reference Badas and Stauffer2023). The last potential usage of the CCC database that immediately springs into mind is the application of various natural language processing methods. For example, one could replicate the research on vagueness of the language of the CCC (Sternberg Reference Sternberg2019) or measure the readability of the CCC decisions (Crossley, Skalicky, and Dascalu Reference Crossley, Skalicky and Dascalu2019; Fix and Fairbanks Reference Fix and Fairbanks2020) and link those measures to interesting research questions. For example, do the better readable CCC decisions get cited by the CCC more than the less readable (Crossley, Skalicky, and Dascalu Reference Crossley, Skalicky and Dascalu2019; Fix and Fairbanks Reference Fix and Fairbanks2020)? Does the CCC use vague language in certain areas/for certain reasons more than for others (Sternberg Reference Sternberg2019)?
There are a couple of elements that I believe make the CCC to be especially worthy of study. First, the CCC has been vested with a large amount of competences: it may review and quash laws in abstract as well as in concrete proceedings initiated by an individual complaint. On top of that, the CCC has broadened its power and may now review even constitutional amendments and historically has not been afraid to step in into politically salient cases.Footnote 1 Second, there is plenty of variance that can give rise to potential research and that is comparable with other constitutional courts: the CCC is internally composed of chambers of differing size and since 2016 the justices rotate between them, justices are allowed to attach separate opinions, or there is a clear role of the judge rapporteur and the court functionaries. That enables and allows to follow up on and build upon the previously mentioned research of impact of chamber system, judicial politics within the chambers (the role of chamber president, judge rapporteur on the opinion (Carrubba et al. Reference Carrubba, Friedman, Martin and Vanberg2012)), research on judicial efficiency (Brekke et al. Reference Brekke, Naurin, Šadl and López-Zurita2023b; Fjelstul and Gabel Reference Fjelstul and Gabel2023), or research on judicial decision-making (such as the dissenting behavior). Finally, the CCC has involved itself in the European space, including the Landtova case, in which the CCC pronounced an EU act as ultra vires (Komárek Reference Komárek2012). Therefore, the database unlocks the potential to empirically research the role of the CCC in and degree of its Europeanization within the EU context (Jaremba and Mayoral Reference Jaremba and Mayoral2019).
The main drawback of the CCC institutional setup is the inability to measure the role of political preferences of justices due to the lack of information on justices’ votes (Martin and Quinn Reference Martin and Quinn2002; Hanretty Reference Hanretty2012) and the lack of variance in the nomination process.Footnote 2 The question indeed remains to what extent is this strand of research relevant for the European context due to the extent of politicization of the nomination process as well as the judicial decision-making being lower than in the US context. Moreover, the database at hand presents observational data of historically a very rigid institution. Therefore, it is difficult but not impossible to devise a quasi-experimental research design.
To name the last contribution of the presented article, the database offers a blueprint for future efforts to build akin databases. In building the database, I attempted to name the variables and structure the data in a transparent, replicable, and comparable way so that any efforts from different courts could mimic my approach without steep costs and efforts.
The article proceeds as follows. In Section 2, I introduced the CCC, namely its compositions, its internal organization, and its powers to give the reader a little bit of context. In Section 3, I introduced the CCC database. Therein, I briefly discussed its structure, its creation, and described its variables. Section 4 then discussed the adherence of the CCC database to four principles of a high-quality dataset, including its relevance for research, as well as to the adherence to the tidy data principles. The last Section 5 concludes.
A brief primer on the CCC
The CCC consists of 15 justices,Footnote 3 including one president of the CCC, two vice presidents and twelve associate justices (following the terminology of Kosař and Vyhnánek Reference Kosař, Vyhnánek, von Bogdandy, Huber and Grabenwarter2020). These justices are appointed by the president of the Czech republic upon approval of the Senate, the upper chamber of the Czech two-chamber Parliament. The justices enjoy 10 years terms with the possibility of re-election; there is no limit on the times a justice can be re-elected. The three CCC functionaries are unilaterally appointed by the Czech president.
The appointment procedure is similar to how the SCOTUS justices are appointed as the procedure lies in the hands of the president of the republic and the upper chamber. The minimal requirements for a CCC nominee are 40 years of age, a clean criminal record, a finished legal education, and experience in the legal field. Other than that, the nomination is left to the consideration of the President of the Republic. After a nomination, the nominee is first interviewed by the constitutional law committee of the Senate, which produces an unbinding recommendation for the plenary Senate hearing. The final binding decision is then made by a simple majority of the Senate plenary hearing. This procedure has led to a situation, in which there is very little variance as to the nominating background of the justices. First, there is no nominating political party akin to the US context or the Spanish context (Hanretty Reference Hanretty2012). Second, because the court was established in 1993 and filled within roughly a year of its establishment and because the term of the Czech president is 5 years and all the 3 presidents, who would finished their term at the time of writing this article, have been elected twice (for 10 years it total), each president has had the chance to appoint all the 15 members of “their” CCC. Therefore, the first term of the CCC has been termed the Václav Havel, the second the Václav Klaus, and the third Miloš Zeman terms of the CCC.
Regarding the competences, the CCC is a typical Kelsenian court inspired mainly by the German Federal Constitutional Court. The CCC enjoys the power of abstract constitutional review, including constitutional amendments. The abstract review procedure is initiated by political actors (for example MPs) and usually concerns political issues. Moreover, an ordinary court can initiate a concrete review procedure, if that court reaches the conclusion that a legal norm upon which its decision depends is not compatible with the constitution. Individuals can also lodge constitutional complaints before the CCC. Finally, the CCC can also resolve separation-of-powers disputes, it can ex ante review international treaties, decide on impeachment of the president of the republic, and it has additional ancillary powers (for a complete overview, see Kosař and Vyhnánek Reference Kosař, Vyhnánek, von Bogdandy, Huber and Grabenwarter2020).
The CCC is an example of a collegial court. Internally, the CCC can decide in four bodies: (1) individual justices in the role of judge rapporteur, (2) 3-member chambers (senáty), (3) the plenum (plénum), and (4) special disciplinary chamber. The 3-member chambers and the plenum play a crucial role. The plenum is composed of all justices, whereas the four 3-member chambers are composed of the associate justices. Neither the president of the CCC or her vice presidents are permanents members of the 3-member chambers. Until 2016, the composition of the chambers was static. However, in 2016, a system of regular 2-year rotations was introduced, wherein the president of the chamber rotates to a different every 2 years. I am of the view that such an institutional change opens up the potential for quasi-experimental research similar to the Gschwend, Sternberg, and Zittlau (Reference Gschwend, Sternberg and Zittlau2016) study utilizing judge absences within the 3-member chambers of the German Federal Constitutional Court. In general, the plenum is responsible for the abstract review, whereas the 3-member chambers are responsible for the individual constitutional complaints.
In the chamber proceedings, decisions on admissibility must be unanimous, whereas decisions on merits need not be, therefore, a simple majority of two votes is necessary to pass a decision on merits. In the plenum, the general voting quorum is a simple majority and the plenum is quorate when there are ten justices present. The abstract review is one of the exceptions that sets the quorum higher, more specifically to 9 votes.
A judge rapporteur plays a crucial role. Hořeňovský and Chmel Reference Hořeňovský and Chmel2015 and Chmel Reference Chmel2017 study the large influence of the judge rapporteurs at the CCC. Each case of the CCC gets assigned to a judge rapporteur. The assignment is regulated by a case allocation plan.Footnote 4 They are tasked with drafting the opinion, about which the body then votes. The president of the CCC (in plenary cases) or the president of the chamber (in chamber cases) may re-assign a case to a different judge rapporteur if the draft opinion by the original judge rapporteur did not receive a majority of votes. Unfortunately, the CCC does not keep track of these reassignments.Footnote 5
The act on the CCC allows for separate opinions. They can take two forms: dissenting or concurring opinions. Each justice has the right to author a separate opinion, which then gets published with the CCC decision. It follows that not every anti-majority vote implies a separate opinion, it is up to the justices to decide whether they want to attach a separate opinion with their vote. Vice-versa, not every separate opinion implies an anti-majority vote, as the justices can attach a concurring opinion. In contrast to dissenting opinion, when a justice attaches a concurring opinion, they voted with the majority but disagree with its argumentation.Footnote 6
The CCC justices can hire their clerk teams. Each justice is required to have at least one clerk. The clerk is appointed by the president of the CCC on the nomination of the said justice. The clerk must have a clean criminal record and a finished legal degree. Other than that there are no requirements on the clerks. The term of the clerks may not exceed the term of the nominating justice. The clerks are usually tasked with drafting decisions and, in narrowly defined cases, can be instructed by the justice to decide on their behalf when an application does not meet even the minimal requirements.
It may be concluded that the CCC takes after the American model of selection of justices, with the president of the republic and the upper chamber being in the spotlight, but it is also a typical example of a Kelsenian specialized court with concentrated constitutional review. The CCC stands out in how strong its constitutional review is, having attracted the power to review even constitutional amendments. That shows that the CCC is a powerful player in the Czech political system. While the appointment procedure of the justices may be compared to the SCOTUS, its role within the constitutional system is akin to the European constitutional courts, with the German Federal Constitutional Court at its forefront. Its doctrinal approaches and methods, such as the test of proportionality or test of rationality, have often been adopted by the CCC. Its power to review even constitutional amendments may then be comparable to the Supreme Court of Israel. The internal organization of the CCC gives room for strategic or policy considerations of its justices. Not only due to the similarities with the constitutional adjudication powerhouses but also due to its own idiosyncrasies, I believe the CCC to be a worthy object of empirical legal research as the conclusions drawn from research of the CCC may be after a careful consideration be extended beyond a mere case study on the CCC.
Description of the CCC database
Now that the CCC has been introduced in the previous section, I move on to describe the content and structure of the dataset in this section.
Case inclusion
The CCC database includes all publicly available CCC decisions from its foundation until the end of 2023,Footnote 7 that is 93826 decisions, as well as background information on its 50 justices and their 221 clerks. All the data were the first web-scraped from the official CCC website Nalus. Nalus is an official publicly accessible database in the form of a website, on which the CCC publishes all its decisions on merits and admissibility. Nalus includes all procedural decisions decided after 1.1.2007, procedural decisions before that date may be missing without a specification as to which are missing.Footnote 8
The web scraping was followed by intense data cleaning and data wrangling processes. A lot of the information was transformed to a more readily form. In the last step, some information was retrieved from the texts or other metadata of the decisions (such as composition of the bench). The CCC database is accompanied by a comprehensive codebook, which contains detailed explanation of its structure, parts and all variables contained therein.Footnote 9
Structure of the CCC database
The structure of the CCC database can, on a very basic level, be divided into the master decision-level table (ccc_metadata), decision-variable-level tables, and justice-level/clerk-level tables. The decision-variable level tables are linked to the master table by the decision identifier and the justice/clerk-level tables are connected by the judge identifier and clerk identifier. I now go over each level of the structure. For the clarity of the ensuing description of the database, Figure 1 presents a diagram of the schema of the CCC database, which can be used as a reference point. Table 1 contains summary statistics of the whole database.
Master table
The whole database is guided by and revolves around a master ccc_metadata table, as seen on Figure 1. The master table contains multiple types of information. The general case information variables contain information a unique identifier of the decisions, a nonunique identifier of the case that may include more than one decisions, and the dates at which the application was lodged and the decision decided. Procedural variables concern whether the decision was a usnesení or nález,Footnote 10 what type of procedure the decision was made in, such as abstract review or constitutional complaint procedure, or on what type of grounds the decision was based. Background variables concern among others parties before the CCC, which are identified (a natural person, a legal person, a court, etc.), the body whose decision was under review (typically which court), the type of decision being reviewed and alike. Moreover, the data on the subject proceedings (relates to the area of constitutional law) and subject register (the pertaining area of general law such as criminal-proof, civil damages, or administrative proceedings) are included. Such variables are especially useful for controlling for specific features of cases that may have confounding potential. Finally, miscellaneous variables contain for example an URL address to the decision in the Nalus database or a note, which typically contains a link to the press release.Footnote 11
Decision-variable-level tables
Some of the aforementioned variables may contain more than one observation per decision. In effect, to keep all information in one table would entail breaking the tidy data principle that each row contains one observation as the observation in the ccc_table is one decision. To resolve this issue, some variables of the master table are stored as a nested list and then unnested into separate tables, in which the observation is a decision-variable level.Footnote 12 These tables are connected to the main table by the unique decision identifier. The unnested tables include, to name a few, ccc_references (contains references to CCC caselaw found in the texts of the decisions), ccc_subject_matter (contains subject matters of a decision), ccc_parties (contains information on the parties, both the applicant and the concerned body, before the CCC), or ccc_compositions (contains the bench composition with a link to the ccc_judges table via the judge identifier).
Finally, the ccc_texts contains full texts of the decisions, which unlocks plethora of potential research endeavors utilizing quantitative text analysis or various machine learning endeavors. The texts have underwent a very little preprocessing as the texts in the Nalus database are in a good state. Most of the html tags have been removed apart from paragraph tag (mostly in the form of one or more \n tags). As the decisions have no clear structure, the texts have been kept as a wholeFootnote 13 and any researcher intending to run any NLP task can simply split them up into a unit they deem fit (tokens, sentences, paragraphs, etc.). A number of variables have already been mined from these texts. To name two, the compositions of sitting benches have been mined using various regex variations of the justices’ names and dissenting opinions as well as their relationships to each other (whether more judges signed one dissenting opinion or whether they dissented separately) have been mined from the texts.
Justice-level and clerk-level tables
Justice-level and clerk-level variables contain information on the individual justices and clerks, respectively. The information was collected partly automatically and partly manually from the official profiles of current justices, former justices, and clerks at the CCC website, as well as Wikipedia profile pages of the justices. The CCC database includes information on the terms of the justices, their age and gender, their alma mater, highest reached degree, as titles play an especially important “ceremonial” role in the Czech legal environment, their professional background before they became a judge,Footnote 14 or information on whether the justice ran for a reelection as the Czech Constitution and the act on the CCC allow for reelection of justices after their 10 year term runs out.
Second, the ccc_clerks table includes information on all 221 clerks that have served in the CCC’s history. The table on clerks contains information, such as under which judge they served, what was their term, what is their gender, education, or whether they studied abroad. Because one clerk could have over time served under more than one judge, the observation of the table is at the term of the clerk level. Therefore, there are more rows than there are unique clerks. I believe inclusion of such an information on clerks makes the dataset quite unique and opens up a lot of avenues for research.
Principles guiding the CCC database
The CCC database is a “multiuser dataset” created in a principled manner. Epstein et al. (Reference Epstein, Martin, Epstein and Martin2014, 14) defined a multiuser dataset as a dataset created with the purpose of “[r]ather than collect data to answer particular research questions […] the idea is to amass a dataset so rich in content that multiple users, even those with distinct projects, can draw on it.”
Accordingly, the CCC database upholds the principles of a high-quality datasets espoused by Weinshall and Epstein (Reference Weinshall and Epstein2020, 424), namely that the database is (1) capable of addressing real-world problems, (2) accessible, (3) reproducible and reliable, and (4) foundational.Footnote 15 The data structure also follows the principles of tidy data. According to Wickham (Reference Wickham2014), tidy data are data with such a tabular structure, i.e. data with a column and row structure, that stick to the following principles
-
(1) every column is a variable,
-
(2) every row is an observation,
-
(3) every cell is a single value.Footnote 16
I now go over and discuss the Weinshall and Epstein (Reference Weinshall and Epstein2020) principles one by one and describe them in detail.
Capacity to address real-world problems
In the words of Weinshall and Epstein (Reference Weinshall and Epstein2020), “By definition, data infrastructure should promote innovation, inventions, and insights. Although no product can guarantee these ends, infrastructure aimed at solving (or developing implications for) real-world problems increases the odds of success.” With the database at hand I hope to enable data- and evidence-based research on the CCC. I now present two examples that corroborate the capacity of the CCC database to address real-world problems and research concerns. The disclaimer is that the goal of presenting these simplified examples is not to draw any inference but rather to show the potential of using the dataset “to develop real-world implications and contribute to public and academic discourse on pressing legal-political issues.” (Weinshall and Epstein Reference Weinshall and Epstein2020, 427)
Clerks
The first brief example concerns the law clerks. Kosař and Vyhnánek (Reference Kosař, Vyhnánek, von Bogdandy, Huber and Grabenwarter2020) argue that the clerks at the CCC play an especially vital and underappreciated role: “The initial idea of the legislature was to grant each justice one law clerk who would take administrative burdens unrelated to substantive decision-making off the justices’ shoulders. Yet the reality is different. First, due to the growing caseload, the number of law clerks per justice increased gradually; today, each justice has three law clerks. Moreover, law clerks de facto prepare drafts of most CCC judgments and decisions, and the real administrative burden has been ‘outsourced’ to secretaries of the cabinets.” The difficulty of studying the role of clerks was highlighted in the Clark, Engst, and Staton (Reference Clark, Engst and Staton2018) study on the effects of leisure on judicial performance. In the existing studies on clerks, their influence on the final decision as “an information source” (Kromphardt Reference Kromphardt2015) or the influence of their gender on their career choice to become a clerk have been researched (Badas and Stauffer Reference Badas and Stauffer2023).
Badas and Stauffer (Reference Badas and Stauffer2023) discovered that women are in general underrepresented among law clerks and that one of the reasons behind underrepresentation of female clerks is that “female law students may have lower levels of ambition compared to men. (…) Examining potential sources of this difference, we find that while women view themselves to be just as qualified for these positions as men, men are more willing to apply with lower feelings of qualification. Likewise, while women and men report similar levels of encouragement, more encouragement is required before women express ambition to hold these posts.” In two studies on the gender equality in the Czech judiciary, Havelková (Reference Havelková2017) and Urbániková, Havelková, and Kosař (Reference Urbániková, Havelková and Kosař2023) revealed that at first glance the representation of women within the Czech judiciary is rather high. However, structurally, the distribution is vertically unequal: female judges dwell on the first-instance courts and take care of the run-of-the-mill decision-making, whereas male judges are overrepresented in the upper echelons of the judiciary, which exert higher influence over doctrinal development, and occupy the judicial functionary positions, which mainly take care of court administration. These studies raise two questions: (1) is the representation of women similarly vertically unequally distributed as that of the justices and (2) is there a discrepancy between the proportion of women among graduates and law clerks?
For the purpose of showing the capability of solving real-world problems, I present concise descriptive statistics in an attempt to answer the question whether there is a discrepancy between the representation of women among clerks, graduates, and justices. While the distribution of gender among clerks at the CCC cannot be compared against clerks at lower instance court, as there are no data available, it can at least be compared against the distribution among CCC justices. Figure 2 confirms the gender discrepancy among justices, however, it also reveals that the representation is roughly equal among clerks. It appears then that the unequal vertical distribution may necessarily not be the case among clerks. To answer the second question, I collected data by Eurostat on the gender distribution among law graduates in Czechia between 2015 and 2021 (Eurostat 2024) and appended it to the CCC database data. I compared the proportion of women among three CCC justices, their clerks, and the law graduates.
There are 22% of female justices out of the total number of 41 judges, there are 47.3% of female clerks out of the total number of 203 clerks, and there are 58.6% of female law graduates out of the total number of 1,453 law graduates. The discrepancy between clerks and graduates is less pronounced but it is still rather pronounced. Women are overrepresented among law graduates, and the overrepresentation is not reflected among clerks. At least the representation remains still roughly equal. As time goes by, predominantly men reach the higher echelons despite the overrepresentation of women at the starting line. In line with the Badas and Stauffer (Reference Badas and Stauffer2023) paper, one could conduct a similar study in the Czech context and draw policy implications, as to how to resolve the underrepresentation. Finally, interestingly, Figure 3 reveals that male and female justice seem to have different preferences regarding their clerks. The male CCC justices seem to hire clerks of both genders equally, whereas female CCC justices seem to hire more male clerks.
Dissenting behavior of justices
The second example concerns dissenting behavior of justices. Research on judicial coalitions at the CCC has revealed that the third period of CCC between 2013 and 2023 is rather polarized and that there are two big coalitions of judges that clash against each other in the plenary proceedings. The division has been coined as left-right or progressive-conservative (Chmel Reference Chmel2021; Smekal et al. Reference Smekal, Benák, Hanych, Vyhnánek and Janků2021; Vartazaryan Reference Vartazaryan2022). The articles rely primarily on network analysis of the dissenting opinions in the plenary proceedings and make strong conclusions based on a rather superficial descriptive analysis.
To make the previously laid out inference more robust, I predict that should the relationships from the plenum indeed exist, they should also carry over to the 3-member chamber proceedings. In other words, my hypothesis is that chambers composed of judges from both coalitions will be more likely to show disagreement in the form of dissenting opinions. The hypothesis is that 3-member chamber decisions composed of members of both judicial coalitions show a higher likelihood of occurrence of a dissent. If this is shown to be true, it would provide further evidence for the two coalition theories of the CCC (Chmel Reference Chmel2021; Smekal et al. Reference Smekal, Benák, Hanych, Vyhnánek and Janků2021; Vartazaryan Reference Vartazaryan2022).
To test these theoretical expectations, I manually annotated which justices of the third term were from which coalition according to the aforecited literature. Otherwise I built upon the CCC database. The selection of decisions has been narrowed: the admissibility decisions of the 3-member chambers must be made unanimously, concurring decisions therein are a rarity. Therefore, I filtered the decisions in the ccc_metadata table by the grounds variable to include only decisions on merits. Because the coalition theory applies only to the third term of the CCC roughly between 2013 and 2023, the decisions were further filtered by the year of decision variable. In the end, 1584 three-member chamber decisions on merits have been included in the analysis. For each of these decisions I filtered the 3 justices that decided the case in the ccc_compositions table using the decision identifier of the 1584 decisions. I then compared those against the vectors of justices’ name of either coalition. If the 3 justices all matched against either of the coalitions, I flattened the filtered compositions table into a decision level by imputing the value “full,” whereas if only 2 justices on the bench were from one coalition and the third justice was from another, I imputed the value “mixed.” I then joined this filtered table to the filtered metadata table by the decision identifier variable. Finally, I grouped by the SO table by the decision identifier and then flattened the table to contain only the information whether an SO was attached to the decision or not. I then joined the transformed table to the transformed metadata. I was left with a table with a decision as the observation level, with an independent variable containing the information whether the bench was fully composed of one coalition or mixed from both and with a dependent variable containing the information whether an SO occurred or not.
Table 2 shows that SOs occur more likely in the 3-member chamber decisions with the mixed composition than in those fully composed of justices from either composition. I conducted a concise hypothesis testing by running a simple difference in means test. Let $ {x}_1 $ be the number of decisions with an SO out of the total number $ {n}_1 $ of decisions with a bench fully composed of one coalition. Let $ {x}_2 $ be the number of decisions with an SO out of the total number of $ {n}_2 $ of decisions with a bench with justices mixed from both coalitions. Let $ {p}_1 $ and $ {p}_2 $ be the proportions of thereof. The hypothesis generated by the brief theoretical introduction is as follows:
I employed a 2-tailed (given the null hypothesis) two-proportion z-test with the significance level at $ \alpha =0.05 $ as I am comparing two proportions of binomial distributed random variablesFootnote 17 to the number of trials. The resulting p-value 1.7e-05 is below the significance level and, therefore, the null hypotheses can be rejected. The result is in line with the theoretical expectations as well as the conclusions of the Czech legal scholarship. The goal is not to prove that any causal relationship exists, as I am for example uncertain whether the 2 samples are independent, it is rather to show that and how the CCC database can be employed to answer practical research questions.
The example, moreover, proves (as will be discussed in Section 4.4.) that the database is foundational in the sense that for the aforementioned model, the CCC database was used as a basis with the majority of the data stemming directly out of it (such as the information on the individual decisions and the compositions), and the remaining information is added and adjusted according to the specific research goal, in this case verifying the theory on the coalitions posited by Czech legal scholars. I believe that the CCC database is a useful contribution and may serve as a basis for rich empirical legal research.
Accessibility
The principle of accessibility demands that “in the creation of high-quality infrastructure is that members of the community should be able to access it with no barriers to entry or use.” (Weinshall and Epstein Reference Weinshall and Epstein2020, 427)
As I have shown in the introduction with specific examples, not all research is reproducible, and not all data are made available. That goes against the principle of accessibility. Weinshall and Epstein refer to studies, according to which the majority of psychological research data stays under embargo or never gets released at all (Houtkoop et al. Reference Houtkoop, Chambers, Macleod, Dorothy, Nichols and Wagenmakers2018) or that only a minority of papers published in journals requiring a data availability statement actually publish their data (Federer et al. Reference Federer, Belter, Joubert, Livinski, Lu, Snyders and Thompson2018).
Following the principle of accessibility, the CCC database is freely and publicly available in full, with the handbook as well as this article attached to it. The data are downloadable at the Zenodo Repository as well as the JLC Dataverse. The data are published out of my own accord, the publication is not funded by any grant or national science foundation.
Reliability and reproducibility
Moving on to the principles of reliability and reproducilibity, Weinshall and Epstein (Reference Weinshall and Epstein2020) defined the principles as follows: “[r]eproducibility means that users and developers alike must understand how to duplicate the data housed in the infrastructure. Reliability is related: it is the extent to which encoded data can be replicated, producing the same value using the same standard for the same subject at the same time, regardless of who or what is doing the replicating.” The heart of the matter of reliability and reproducibility is internal consistency of the dataset, not necessarily its external validity.
The data must have been reliably generated. In my case, I did not narrow down the selection of cases: all cases of the CCC that have been made publicly available from its history have been web-scraped from its website, including all the available information as well as the texts of the decisions. Reproducibility also demands that anyone with sufficient skill should be able to reproduce the database on their own based on the provided information. All the code has been made available on GitHub, the code is written in a clean manner and is commented.
Bound to both principles is the issue of coding the variables. To this end, the amount of human input has been minimized. The vast majority of the information provided has either been directly (or with minimal input) collected from the CCC website or has been transparently automated to the maximum possible extent (including the full information on the clerks). Only the biographic information on judges has been imputed via human input using the official profiles of justices on the CCC website as well as Wikipedia as sources. The rest is the product of the published code.
There are two potential sources of unreliability. One is coming from the Nalus database, the other is coming from the data mining process that was to a great extent automated. Regarding the former source of unreliability, it is difficult to estimate its extent. According to my internal insight, some of the information (such as the subject matter) is inserted manually, mainly by the justices’ clerks and the court’s analytic unit. It is easy to imagine that it is hard to maintain consistency across time-spanning decades and between different chambers and justices. To verify the validity and reliability of the data mining process, I check two variables that have been mined from the texts of the decisions: the compositions and the information about SOs.
The compositions have been retrieved using a regex search of the first couple of paragraphs of the decisions using lemmatized names of the justices. After some trial and error, a couple of error patterns emerged. Many chamber decisions contained four names. The issue was that those decisions were decisions on the independence of one of the justices deciding a case, which always occurred as the last in the decision. Therefore, in the case of chamber decisions with 4 justices found in the texts, the last name has been removed. Three-member chamber decisions with either 1 or 3 found names are deemed correct as simple cases can be expedited by one justice, whereas 3-member chamber decisions with 0 or 2 found names are deemed incorrect. Plenum decisions are harder to verify as it is impossible to determine the correct number of justices as a benchmark. To name the reason as to why, at one point in CCC’s history, there were as few as 10 justices sitting on it when president Václav Klaus hesitated with nominating justices after a feud with the Senate and the number fluctuated within a short period of time.
In any case, it can be determined when a number of justices found in the text of a decision is undoubtedly faulty. The clear mistake is when there is either 0 or 2 justices found in the text. The 0 name found is typical for the first term of the CCC, in which the composition of the bench was not always enumerated in the text of the decision. As such, it is nearly impossible to retrieve the information using the case allocation plan and the identification of the chamber as any justice could have been sidelined due to illness, lack of independence, or replaced by one of the functionaries, none of which is captured in the original database, nor necessarily contained in the text of the decision. The error with 2 names being found typically included a judge rapporteur decision with another former or future justice in a different role such as the legal representative of one of the parties or more rarely a hard to generalize typo in one of the names of the justices. To prevent this type of error inasmuch possible, the regex search was limited only to the first two paragraphs of a decision.
The accuracy develops over time. The first term is rather unreliable, especially the plenum decisions barely ever contain the names of at least 10 justices. The second term is rather reliable and the third term is practically completely reliable. Table 3 shows the ratio of correctly to incorrectly retrieved compositions.
The numbers clearly show that while the first term is rather inaccurate, the consistency with which the CCC includes information in its decisions greatly increased over time to the point that the third term is practically completely accurate.
The accuracy of data extraction of the information on separate opinions underwent verification too. While the information whether a justice attached a separate opinion or not is generated by the Nalus database and, therefore, is presumed to be accurate, the information on whether the judge dissented alone or in a group with others was retrieved using regex search. The information is labeled as correct if a name of a justice appeared in a set context of variations on the term “separate opinion,” whereas it is labeled as missing when the regex search could not find the name of the dissenting justice in that delimited context. Table 4 reveals to what extent the data extraction was inaccurate.
The numbers clearly show that while the first term is rather inaccurate, the consistency with which the CCC includes information in its decisions greatly increased over time to the point that the third term is practically completely accurate.
Therefore, I can conclude that the CCC database is reliable to the extent that the data generating process is reliable and consistent. Insofar the decisions of the first decade of the CCC were plagued with a degree of inconsistency and missing information, so is the database. To some extent, I attempted to capture and correct the errors that were possible to be verified. It is not possible to verify to what extent would any potentially imputed information be accurate. I could, for example, deduce that on the first term the first chamber consisted of the same 3 judges (as the system of rotations between chambers was introduced only in 2016). Unfortunately, the procedure at the CCC foresees plethora of exceptions – the judges can be removed for lack of impartiality, they can be simply out with illness, or according to the act on the CCC, the 3 functionaries that are not permanent members of the 3-member chambers can replace a judge on a case-to-case basis. Unfortunately, because there are no publicly available procedural decisions on these replacements, without the official data, relying on the case allocation plan would be at best a guesswork. That would result into inaccurate data instead of missing data. To the remaining extent, thus, the data have been left as missing.
Foundational
The principle that a dataset be foundational requires that it should serve “as a foundation upon which researchers can build by adding content, backdating, updating, or otherwise adapting it to their own needs; it should not be the be-all, end-all.” In other words, the principle promotes a generally usable data over one-off solutions to particular research questions. The CCC database is foundational. The database includes comprehensive background data on each and every case, bibliographic data on the justices, quite unique data on the clerks, as well as a full-text corpus of all the decisions. As I have shown in the example of coalitions, to answer a real research question raised by Czech legal scholarship, the database, used as a foundation, was supplemented with additional variable. The coalitions variable was again based on the CCC database’s information on compositions and the manually annotated information on which justice belonged to which coalition. As I have shown in the clerk case, the data on clerks were supplemented by an Eurostat data to reach an interesting conclusion about the transition of graduates into the clerk careers.
Conclusion
I introduced a database on the CCC while aiming at bridging the gap between the traditionally doctrine oriented European scholarship and the more empirically methodologically more rigorous US scholarship. The database, in my view, enables empirical research in the CEE region that has been lacking in the past on methodologically rigorous empirical research. The database unlocks research on the decision-making of judges and judicial politics, such as their dissenting behavior, strategic acting, or the influence of their clerk teams, on the institutional set up of the CCC, such as the introduction of rotations or the various ways to expedite the CCC caseload, and finally on the texts of the decisions themselves, for example, studying various linguistic features of the decisions such as readability or vagueness. Therefore, the article makes a valuable contribution to the (European) empirical legal research scholarship.