1. Introduction
Databases of decisions of the Court of Justice of the European Union (CJEU) are at the heart of academic and practitioners’ work on EU law. Two official databases feature most prominently in this line of work: Curia and Eur-Lex. Those more familiar with them will be aware of some differences between them, but despite the centrality of these databases in research on the EU, there has so far been no systematic attempt at clarifying the degree to which they overlap. Relatedly, one may regard as particularly concerning the fact that to date neither database contains fully digitised text of landmark rulings handed down before 1990 that are considered to belong to the ‘Pantheon’ of EU law such as Costa v ENEL and Van Gend en Loos. This short research note shows concretely the limitations of both databases and how new work in this area contributes to overcoming them.
Nonetheless, in addition to the varying temporal and site-specific coverage of the main CJEU databases, much of EU law scholarship overlooks the linguistic dependence of CJEU decisions. This research note quantifies the perhaps surprising extent to which decisions in French (the working language of the CJEU) outnumber decisions translated into English (the language used by most scholars and practitioners). While we do not expect users to switch to French en masse, it may be useful to understand the risks associated with working with CJEU decisions in English.
2. Extant work
The use of Curia and Eur-Lex is so widespread in EU law that it is not necessary to dwell on their importance. Both doctrinal and social science scholars source their information from these databases. Some of the extant research attempted to assemble collections of CJEU decisions, but they all suffer from drawbacks. Databases are either focused on just one of the sourcesFootnote 1 or they do not address the discrepancies between them.Footnote 2 The only database that has so far successfully tackled problems with the official sources does not contain the texts of CJEU decisions,Footnote 3 which limits its usefulness for legal scholars or researchers in the burgeoning text-analysis domain.
In the absence of a unified database of CJEU decisions, the vast majority of researchers in this area defaults to either Curia or Eur-Lex (or a combination of the two). A typical piece of research will rely at least in part on a keyword search attempting to identify the relevant line of the case law whereupon the coherence of a new judgement with preceding cases is examined. The alternative to conducting one’s own search for relevant cases is to surrender to the citations selected by the Court. While capable of saving time, this latter strategy restricts the scope of independent, critical inquiry. The CJEU’s citations offer a biasedFootnote 4 picture of its own case law and should be approached as a statement of what the Court wants the reader to see rather than what its case law actually said. An analysis of the coherence of a given line of case law needs to start from a mapping of all relevant cases,Footnote 5 not all of which have likely been cited by the Court.
3. Comparison of Curia and Eur-Lex
Assuming a researcher is interested in systematically identifying a set of cases satisfying some criteria (such as the presence of keywords) – or even all the decisions produced by the CJEU – the question remains whether they can rely on Curia and Eur-Lex for the job. In order to answer this question, we need to systematically examine the coverage of both databases. This is now possible thanks to our ongoing work on the IUROPA Text Corpus, a new database of the texts of all CJEU decisions that consolidates and improves the content of both Curia and Eur-Lex.Footnote 6 In this research note, we use this database merely to benchmark the coverage of Curia and Eur-Lex to underscore the importance of completeness rather than explore all the research possibilities offered by a more complete text corpus, a topic deserving of separate attention. The reason is that even users of Curia and Eur-Lex who will not use our new database should be aware of the pitfalls of working with the official sources.
Both keyword searches and more advanced text analytical techniques require textual input in the plain text format.Footnote 7 Thus, even though it is possible to access the PDF of a ruling on Curia or Eur-Lex (although even here the coverage differs), we want to know how many decisions (including Advocate-General (AG) opinions) are available in a fully digitised, plain-text format. Because French is the working language of the Court but English the most used language in academia and legal practice, we look at database coverage in both languages.
Table 1 compares the number of decisions available in plain text on Curia, Eur-Lex and in the IUROPA database across English and French.Footnote 8 The IUROPA database uniquely identifies decisionsFootnote 9 and sources their texts from both Curia and Eur-Lex, depending on which offers the higher quality text.Footnote 10 In addition, it digitises PDFs where no or only poor plain text exists. EU law experts will be familiar with the subpar and incomplete digitization of older decisions – including the most coveted ones such as Costa v ENEL – on Eur-Lex.Footnote 11 These documents are only available in capitalised letters (with punctuation issues) and miss the Court’s exposition of the facts, law and arguments of the parties.
Even from the overview in Table 1 we can glean the scale of discrepancies between the official databases and the more comprehensive IUROPA corpus. The combined database contains a staggering 10,000 French decisions in plain text more compared to either Curia or Eur-Lex. Moreover, as not all decisions are translated from French to English, there is a similarly large difference between the two language versions. There are many more plain-text documents in English on Eur-Lex than Curia but similar numbers of decisions in French. However, this masks the different coverage of the two databases, as shown in the Venn diagram in Figure 1.
Figure 1 reveals that even though the number of documents in French is similar on both Curia and Eur-Lex, the two databases are to a significant degree non-overlapping. Two factors primarily influence the discrepant coverage. First, Curia only contains plain-text documents from June 1997 onwards. Decisions before this date are only available in PDF files on Curia. Second, many decisions never make it to Eur-Lex. The process by which files are transmitted from the Court (and its database, Curia) to Eur-Lex (maintained by the EU Publications Office, an independent agency) is not at all transparent. But in general terms, Curia is the more comprehensive database of the two official sources for decisions rendered after June 1997.
To save resources, the CJEU does not translate all its decisions into English from French, its working language.Footnote 12 As a result, the majority of research conducted in English is liable to miss a non-negligible portion of rulings.Footnote 13 The scale of the discrepancy comes to the fore especially when considering all decisions handed down by the CJEU (as captured by the IUROPA Text Corpus) and is more significant from 2005 onwards (see Figure 2).Footnote 14 Although the Court prioritises the translation of what it considers the most important decisions,Footnote 15 the exact size of the gap between English and French will also vary based on the area of the law and the deciding court. The General Court is more likely than the Court of Justice to see its rulings go untranslated. Similarly, VAT cases, for example, are more likely to remain in French only compared to, say, citizenship cases. The translation policy thus impacts differently on scholars depending on the focus of their work.Footnote 16 The upshot is, however, that researchers should be mindful of the risk of missing relevant cases if they choose to work exclusively in English.
Looking merely at the number of decisions available in plain text obscures the fact that many of these documents are incomplete, in particular before records went digital in 1990. The IUROPA Text Corpus has made significant strides towards fully digitizing decisions adopted between 1954 and 1989, but the work remains in progress.Footnote 17 Of the document pages 55 per cent in French have so far been processed and incorporated into the database.Footnote 18 In addition to the digitised text concerning some of the most important rulings in EU law history, the amount of judicial text recovered in this way is also significant. The partially digitised decisions from this period on Eur-Lex contain on average around 55 paragraphs. In contrast, the fully digitised documents in the IUROPA corpus average some 120 paragraphs.Footnote 19
4. Implications
Overcoming the limitations of the official databases has practical implications for all types of research that rely on the texts of CJEU decisions. Legal scholars stand to benefit from a database that dutifully retrieves the relevant information from the entire universe of CJEU decisions, rather than whatever undefined selection of them happens to live on either Curia and Eur-Lex. By way of example, scholars working on the reception of international law in the EU legal order might want to trace the history of engagement with customary international law and legitimately ask when the CJEU referred to it for the first time. The case that most often comes up in this regard is Poulsen, decided in 1992.Footnote 20 Konstadinides mentions Van Duyn v Home Office Footnote 21 (1974) but this ruling in fact does not mention custom explicitly.Footnote 22 If we search Eur-Lex for ‘customary international law’ or ‘droit international coutumier’,Footnote 23 the earliest mention is traced to AG Slynn’s opinion in Hurd (1985).Footnote 24 If we search Curia for the English term, we do obtain the correct result – a little known competition case Geigy v Commission decided in 1972 –Footnote 25 but there is only a PDF document to work with.Footnote 26 Interestingly, if we search Curia for the French term, the results do not include Geigy at all.Footnote 27 If all decisions were properly digitised and available in plain text – what the IUROPA Text Corpus is working towards – these discrepancies would not arise.
The implications for scholars using CJEU texts in a quantitative analysis are potentially even more profound. No existing text analysis of the CJEU corpus is derived from the full universe of decisions.Footnote 28 Even though a comprehensive and validated database of all decisions should be an obvious starting point for quantitative text analysis, the labour involved and academic publishing culture likely disincentivised the creation of a solid CJEU database so far. There is a compelling case for revisiting many existing quantitative findings once such a comprehensive database is fully available.Footnote 29 In addition, the advent of the artificial intelligence age means that a bewildering array of new analytical tools becomes available to researchers. Unlike older computational techniques, however, the most advanced techniques nowadays can take advantage of high-quality texts with correct paragraph segmentation, capitalization and punctuation. At the same time, it should go without saying that a more complete database does not automatically translate into better research. Nonetheless, we hope that collating CJEU texts will enable applied researchers to spend less time on relatively unrewarding, technical work and more on coming up with creative research designs.
5. Conclusion
This research note sheds light on discrepancies between the two official sources of CJEU decisions – Curia and Eur-Lex – and the two most used language versions of these decisions (English and French). The proper functioning of their search engines relies on the availability of documents in plain text. However, our analysis shows that coverage in terms of number of decisions differs widely, both between databases and languages. Combining and adding to the two official databases, the IUROPA Text Corpus achieves coverage that is more than 10,000 decisions complete in French than either Curia and Eur-Lex, demonstrating the risks of using either of them in isolation.
The technical and linguistic discrepancies between Curia and Eur-Lex have practical implications for both doctrinal and quantitative scholars. While legal scholars are right to second-guess the accuracy of the search results on either website, quantitatively minded researchers should look to a new database that remedies not only the discrepant coverage but also the absence of many high-quality digitised texts prior to 1990. Moreover, all scholars researching primarily in English (or another language) must contend with the fact that thousands of decisions are only available in French. At the very least, it is worth pausing to reflect on the extent to which the (non-)translation issue affects one’s work.