1. Introduction
Policy complexity affects many important political processes including delegation and policy diffusion (Kiewiet and McCubbins, Reference Kiewiet and McCubbins1991; Epstein and O'Halloran, Reference Epstein and O'Halloran1999; Braun and Gilardi, Reference Braun and Gilardi2009; Makse and Volden, Reference Makse and Volden2011). However, policy complexity is not directly observable and is therefore difficult to capture. Researchers rely on different measures, including readability scores or the number of articles, but most prior research only looks at a single aspect of policy complexity at a time. Below I argue and empirically validate that a policy's complexity is best defined by two aspects: its textual sophistication and the number of ties to other policies. The findings are important because they emphasize the role of both internal (i.e., textual) and external (i.e., relational) characteristics in explaining what makes policy difficult to understand. More broadly, they have implications for our understanding of the causes and consequences of complexity in policy-making.
2. Defining complex policies
While some scholars argue that policy complexity depends on the length and detail of a policy (Ehrlich, Reference Ehrlich2011; Hurka and Haag, Reference Hurka and Haag2020), others say that policy complexity is the result of the increasing number of relations between policies (Krehbiel, Reference Krehbiel1991; Adam et al., Reference Adam, Hurka, Knill and Steinebach2019). The former count the number of articles and words or rely on readability indexes such as the widely used Flesch Reading Ease formula. The latter investigates the complexity of policies through document network analysis (Katz and others, Reference Katz, Coupette, Beckedorf and Hartung2020). Crucially, most prior research focuses on a single aspect of policy complexity at a time.Footnote 1
My definition integrates two approaches. I define complex policies as those that have a high level of textual sophistication and a large number of ties to other policies. For individuals who want or need to engage with a policy there is no way around reading the text of the policy. Reaching a good understanding of the policy can be straightforward if it is written in an accessible manner. By contrast, a policy with a high level of textual sophistication makes it harder for the reader to understand. There exist many potential sources of textual sophistication. The characteristics that contribute to making a text more complex are text length, the use of longer words, the use of uncommon words, and the use of more complex syntactic and grammatical structure (Benoit et al., Reference Benoit, Munger and Spirling2019). Findings from different fields including medicine, communication science, and political science provide evidence that textual sophistication matters a great deal for humans’ understanding of text (Leroy et al., Reference Leroy, Helmreich and Cowie2010; Bischof and Senninger, Reference Bischof and Senninger2018; Tolochko et al., Reference Tolochko, Song and Boomgaarden2019; Bischof and Senninger, Reference Bischof and Senninger2022).
The second defining feature of complex policies goes beyond a policy's own characteristics and considers its wider context. This means potential cross references to other laws, rules, and regulations. The reason why ties to other policies are considered to be important is that they provide information that is relevant to fully understanding a policy. The decision to draft a new policy is very often motivated by insufficiencies of already existing policies. However, instead of withdrawing insufficient policies and replacing them with new and better policies, a process of policy layering or policy accumulation is increasingly common (Adam et al., Reference Adam, Hurka, Knill and Steinebach2019). Whenever two polices build upon each other or regulate very similar policy domains, it is likely that the newer policy makes reference to the older policy to describe the relation between the two (Krehbiel, Reference Krehbiel1991). A large number of references to other policies can make it more difficult to reach a full understanding of a policy because the consideration of additional related policies is required.
3. Empirical roadmap
The central proposal of this research note is that both textual sophistication and ties with other policies should be used to capture policy complexity. In the following, I first validate the proposed defining features of complex policy by showing that they are crucial for humans’ understanding of policy. This section builds on and expands the workflow presented by Benoit et al. (Reference Benoit, Munger and Spirling2019). Thereafter, the proposed definition is operationalized using a large corpus of policies, and it is shown to outperform alternative operationalizations in predicting a theoretically relevant outcome, namely the level of legislative delegation. Both empirical exercises are conducted in the context of the European Union because it constitutes a large and important jurisdiction that has law-making powers in a broad range of policy areas.Footnote 2 The flowcharts in Figure 1 provide information about the individual steps of the two empirical tests.
4. Validation of proposed defining features
Step 1: First, human judgments of the relative complexity of policy texts using crowdsourcing were collected. The approach involves non-experts who were asked to complete micro-tasks and works particularly well for identifying (latent) document characteristics (Carlson and Montgomery, Reference Carlson and Montgomery2017). The data consist of comparable short passages of text taken from recitals of European Union rules (Thomson et al., Reference Thomson, Arregui, Leuffen, Costello, Cross, Hertz and Jensen2012). Recitals are listed before the articles of a policy act and state the reasons for the provisions, principles, and assumptions on which the act is based.Footnote 3 From this text corpus, text snippets of varying length were randomly drawn. Following a stratified sampling method, the snippets drawn for comparison were constrained to groups of the same number of sentences and a similar number of characters to avoid comparisons in which coders simply select the one noticeably shorter than the other.Footnote 4
Participants were recruited using the crowdsourcing platform Prolific. The sample is representative of the population of the UK using proportional cross-stratification on sex, age, and ethnicity. In total, 597 individuals participated in the task.Footnote 5 Upon accepting the task, participants were shown a description of the task and a number of examples (see Figure SI 2 and the upper panel in Figure SI 3). Each respondent was asked to compare 15 randomly assigned pairs (two pairs for the purpose of attention checking). To screen respondents’ attention, instructive manipulation checks were used (see lower panel in Figure SI 3). For the main analysis, I exclude respondents who failed to pass the attention checks leading to 536 participants and a total of 6962 comparisons. The average number of judgments per snippet is 5.1.
Step 2: The second step is to estimate the underlying complexity using the model for pairwise comparisons developed by Bradley and Terry (Reference Bradley and Terry1952). The Bradley–Terry model assumes that the odds that snippet i beats snippet j are α i/α j, where α i and α j are parameters representing the “easiness” of snippets, as respondents were asked which text snippet was easier to understand. The model can be expressed in logit form: logit [Pr(ieasier thanj)] = λ i − λ j, where λ i = log α i for all i. Fitting the equation to the pairwise data results in estimates of λ i for each text snippet, representing an unconditional estimate of that text's relative easiness.Footnote 6
Step 3: The next step is to select potential predictors of this outcome, considering a model of the form: $\lambda _{i} = \sum ^{p}_{i = 1} \beta _i{x_{i}} + U_i,\;$ in which easiness of each snippet i is related to explanatory variables x i, …, x p through a linear predictor with coefficients β 1, …, β p. U i represents independent errors (Turner and Firth, Reference Turner and Firth2012). The estimated coefficients $\hat {\beta }$ indicate the marginal effect of each covariate on the perceived relative easiness of the text snippets. To represent textual sophistication, the absolute number of words and characters are considered. In addition, several variables that are part of the best model to explain textual complexity as presented in Benoit et al. (Reference Benoit, Munger and Spirling2019) are used. These are the mean number of characters per word, the mean number of characters per sentence, and the least frequent word's relative frequency based on the Google books data set. Finally, I add the Flesch Reading Ease, a common readability measure, determined by the number of words and the average number of syllables per word. To represent the second defining feature of complex policy, it was manually coded whether a text snippet refers to any existing legal acts or additional documents including treaties, conventions, communications, and resolutions. The variable comes in two versions. The first version is binary coded and indicates whether a text snippet includes a reference to any rules or documents. The second version indicates the number of such references. The appearance and number of abbreviations are also considered. By convention, recitals can start with the word “Whereas.” For each text snippet it was recorded whether this is the case or not. All variables are listed in Table 1.Footnote 7
To assess the predictive power of the listed covariates, random forest models with 1000 trees were used. Random forests are chosen because they parsimonious, general, and less prone to overfitting (Lantz, Reference Lantz2015). They produce estimates of the relative importance of each variable which is useful information for selecting the best predictors of easiness of text snippets. Figure SI 5 ranks the variables’ importance according to the value of the increase in mean squared error (MSE) as a result of a variable being permuted. At each node in each tree, three random variables were tried for the regression. This showed that some of the variables used in previous research also matter for predicting the easiness of short passages of text from recitals (especially the mean characters per sentence). Even more important are the absolute number of words. The mean characters per word also matter. These results provide evidence that textual sophistication is important for humans’ understanding of policy text. In addition, it shows that ties with other rules and regulations matter as well. The variable representing the number of references to existing rules and documents contributes to the prediction of the outcome fourth most. When permuting the values of the number of references to other documents over the data, the increase in the MSE is 11 percent. For the binary predictor, the increase is 12 percent.
Step 4: Finally, I use the most predictive variables to fit structured models and assess their performance in predicting the pairwise contests. I compare the models against a baseline model that includes the widely used Flesch Reading Ease score as its only covariate (model 1). Model 2 includes the two most predictive variables of textual sophistication. These are the number of words and the mean characters per sentence. Model 3 keeps the two variables to capture textual sophistication but adds the number of ties with other rules and documents. This third model captures both of my defining features of policy complexity and performs best, with the lowest Akaike information criterion (AIC) (9205.2) and the highest proportion of pairwise comparisons correctly predicted (0.772).Footnote 8 For the first model, we see that the AIC is 9613.2, and the augmented proportion of contests in the data correctly predicted is 0.676. Model 2 outperforms the first model with a lower AIC (9374.4) and a higher proportion of pairwise comparisons correctly predicted (0.752) (Table 2).Footnote 9
It is important to note that both defining features matter for our understanding of policy text, as a model that only includes textual sophistication is clearly outperformed by a model that features both textual sophistication and ties to other rules. To demonstrate the face validity of the results, text boxes in Section E in the Supplementary information present text snippets used in the pairwise comparisons which the best performing model identified as having a very low, an average, and a very high level of complexity, respectively.Footnote 10
5. Testing performance in predicting delegation
In the following, I test how different operationalizations of policy complexity perform in predicting the level of legislative delegation. Step 1: For this purpose, several data sources were merged. For the level of legislative delegation, data come from Anastasopoulos and BertelIi (Reference Anastasopoulos and BertelIi2020), who use machine learning techniques to measure the amount of delegation to the European Commission and member states’ national administrations in directives and regulations. The predicted values for each provision, effectively articles and sub-articles, are aggregated so that the dependent variable gives the delegation ratio (Δi) for each law i. The delegation ratio (Δi) represents the number of provisions delegating authority D i divided by the total number of provisions in the law P i (Δi = D i/P i).Footnote 11 Step 2: The data also include the raw text of the provisions of the law. This allows me to operationalize the textual sophistication in a similar manner to that used in the pairwise comparison described above. More specifically, the number of words and the mean number of characters per sentence for each law i were estimated. Moreover, the ties to existing legislation, treaty articles, and court judgments for each piece of legislation were measured. The number of ties was not directly extracted from the raw text of the provisions but taken from a recently introduced database tracking connections between European Union laws (Fjelstul, Reference Fjelstul2019).
Merging these data sources provides me with the delegation ratio, the mean number of characters per sentence, the number of words, and the number of ties to other policies for more than 13,000 pieces of legislation enacted by the two co-legislators, the Council of the European Union and the European Parliament, between 1958 and 2015. To compare the performance results against a baseline model, I operationalize complexity using the average Flesch Reading Ease of a law's provisions. In addition, I compare my suggested definition of policy complexity to an operationalization that uses the number of recitals to measure policy complexity. For this purpose, subsets of the data described above were merged with data from two studies including information about the number of recitals (Steunenberg and Rhinard, Reference Steunenberg and Rhinard2010; Migliorati, Reference Migliorati2020). Step 3: The models consist of the delegation ratio as the response variable and an operationalization of policy complexity as the predictor variable. The main goal is to get optimal predictions based on a linear combination of the described variables (Cranmer and Desmarais, Reference Cranmer and Desmarais2017). Step 4: To assess model performance, five-time repeated tenfold cross-validation was applied.
The final model error is the mean error from the various iterations. Table 3 shows the performance results. The proposed operationalization of policy complexity using the number of words, the mean number of characters per sentence, and the number of ties shows a smaller root mean squared error and mean absolute error. The differences are consistent but not very large. In addition, the R 2, which tells us the proportion of the variance in the response variable that can be explained by the predictor variable(s) in model 2 is clearly larger. Additional results show that my proposed definition and operationalization of policy complexity outperforms an alternative operationalization that is often used in the context of the European Union, namely the number of recitals. All model comparisons show that models with the proposed operationalization have a lower RMSE and higher R 2 which tells us that they are able to fit the data better than the alternative operationalization.
Note: Range of delegation ratio is 0–1.
RMSE, root mean squared error; MAE, mean absolute error.
6. Discussion and conclusion
This research note brings forth important implications for scholars interested in the causes and consequences of complexity in public policy. It presents a definition and operationalization of policy complexity that is validated at the individual level and at the same time turns out to be a relevant predictor of legislative delegation. As such, the approach performs well in a theoretically meaningful test. This stands in stark contrast to existing measures, including readability scores and the number of articles and recitals, that are based on strong implicit assumptions. As a result, future studies are well advised to incorporate operationalizations of textual sophistication and ties between policies to make sure that their measure captures features that actually impact on humans’ understanding of policy text. When the proposed operationalization is used to explain a phenomenon like delegation, researchers should ensure that policy complexity and delegation measures are separated, as delegation is sometimes measured by the length of a bill. The proposed definition and operationalization focus upon features that are generally applicable. They do not assume that policies are difficult to understand simply because they belong to a specific policy context but rather focus on general features. This means that the approach can be applied to different topics such as policy diffusion and in different contexts including individual countries within the European Union but also outside of the European context, such as in the USA, and even in sub-national politics.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/psrm.2023.23. To obtain replication material for this article, https://doi.org/10.7910/DVN/IPW0M9
Acknowledgments
I want to thank Jason Anastasopoulos, Kristina Bakkær Simonsen, Jens Blom-Hansen, Steffen Hurka, Heike Klüver, Stefan Müller, Fritz Sager, Bruno Castanho Silva, and Christian Rauh. I acknowledge funding from the Department of Political Science at Aarhus University.