I Data as a Resource for the Artificial Intelligence Economy
Business capacity to collect and process digitalized information (data) at unprecedented scale and speed is transforming economies around the globe. One aspect of this transformation is the relevance of data as a ‘resource’ for relatively recent advancements in artificial intelligence (AI) technology in various forms of machine learning, most notably ‘deep learning’. The theoretical foundations for this kind of AI go back to the 1950s, but only the availability of novel and larger datasets led to the end of a long ‘AI winter’ and the dawn of an ‘AI spring’.Footnote 1
The growing but unevenly distributed ability to capture information about the world in digital form is a complex phenomenon. The public discourse surrounding data seems somewhat detached from the sophisticated ways in which scholars have theorized the relationship between data, information, knowledge, and wisdom.Footnote 2 The lack of adequate terminology to capture the phenomena caused by the gradual digitalization of economies and societies is evidenced by the vain search for metaphorical equivalents.Footnote 3 The effort to assess the effects of digitalization on the economy is severely hindered by a paradoxical lack of data about data, since the commercial value of data is reflected neither in balance sheets nor in the conventional metrics used to assess the state of the economy or trade.Footnote 4 Yet, it seems misguided to attribute this lamentable state of affairs solely to the notorious in-transparency of global digital corporations or the inertia of accountants, statisticians, and policy-makers in responding to digitalization on unprecedented scales. Data’s variegated characteristics pose distinct challenges for data’s economic evaluation and legal conceptualization.Footnote 5 This chapter cannot resolve these questions. It treats data as an essential rent-generating productive asset in the AI economy – and therefore also a contested economic resource.Footnote 6
The chapter builds on and expands earlier work on data-related provisions in recent instruments of international economic law (IEL) and sketches some questions for ongoing and future research about how IEL might need to be recalibrated to adapt to a global digital economy.Footnote 7 This earlier work focused on the new template of rules for a global digital economy that the United States championed in the negotiations for the Trans-Pacific Partnership (TPP), now in force as the Comprehensive and Progressive Agreement for Trans-Pacific Partnership (CPTPP),Footnote 8 followed by the US-Mexico-Canada Agreement (USMCA),Footnote 9 and the Japan-US Digital Trade Agreement (JUSDTA).Footnote 10 Negotiations on new rules for ‘electronic commerce’ in the World Trade Organization (WTO) seem unlikely to yield tangible outcomes in the near term,Footnote 11 but certain CPTPP members have moved ahead with TPP-plus templates for digital economy agreements, ostensibly designed for adoption by others.Footnote 12 While the tension between data governance in trade agreements and domestic data protection and privacy policies is increasingly better understood (despite the persistent silos and splendid isolation in which the trade and privacy communities have long operated),Footnote 13 there is surprisingly little discussion about the ways in which existing and emerging IEL constrain and shape states’ policy choices for data-driven economic development.
This chapter is an attempt to contribute to this much-needed debate by exploring the extent to which IEL regulates data as a resource for the AI economy. Section II identifies regulatory interventions – open data initiatives, cross-border data transfer restrictions, and mandatory data sharing – that nation states are already enacting or at least contemplating to ensure access to data for their domestic AI economy. Section III shows how some of these regulatory interventions are in tension with existing and emerging commitments under international trade and investment law along the dimensions of data control (mainly through international intellectual property law and international investment law) and data mobility (mainly through commitments in favor of free data flows and against data localization). Section IV concludes by imagining ways through which IEL could provide more flexibility for experimental digital economy policies to confront asymmetric control over data as countries transition, asynchronously and unevenly, toward an AI economy.
II Emerging Digital Economy Policies: Regulating Data as a Resource
By January 2020, twelve of the G20 countries had announced official AI strategies, with others bound to follow.Footnote 14 Virtually all of these strategies discuss the relevance of data for a future AI economy, commonly under the somewhat vague concept of ‘data governance’. The emphasis is often on data protection and privacy-related concerns, which is a function of the dominant legal discourse in the digital domain and the gradual emergence and subsequent entrenchment of certain regulatory models for data protection.Footnote 15 Countries’ AI strategies increasingly also recognize and address concerns about discrimination caused by algorithmic bias. In contrast, the regulatory interventions that states are considering to challenge the domination of the digital domain by US and Chinese companies, especially in AI, are relatively timid, with the notable exceptions of the European Union’s (EU’s) antitrust enforcement against US companiesFootnote 16 and India’s emerging e-commerce policy that espouses an openly protectionist agenda to grow a domestic AI economy fueled by ‘Indian data’.Footnote 17
Countries that recognize the salience of data for the AI economy often endorse efforts to make governmental data available as ‘open data’. While several countries have some form of data transfer restrictions to retain jurisdictional control over data, India stands out in its advocacy for restricting the outward transfer of data to safeguard data as a national resource, thereby challenging the anti-protectionism consensus in IEL. Some jurisdictions recognize a need for regulatory intervention to transfer data from those who have it to those who want or need it. Exploring each of these three interventions – open data, data transfer restrictions, and mandatory data sharing – as efforts to regulate data as a resource for the AI economy reveals their limited purchase in confronting pervasive data concentration – and makes apparent that alternative measures might be needed.Footnote 18
A Open Data Initiatives
The open data movement has been quite successful in convincing governments that making governmental data publicly accessible under open data licenses is in their best interest to stimulate the domestic (or even local) AI economy. Examples include the EU’s Open Data DirectiveFootnote 19 and Singapore’s ‘Smart Nation’ initiative,Footnote 20 but the open data bandwagon also carries several developing countries.Footnote 21 There are many reasons for and drivers behind the push for open data, one of which is the purported value for innovation and economic growth.Footnote 22 AI development is often referenced as a use case for open data: the remarkable improvements in algorithmic image recognition technology, now widely deployed for facial recognition purposes, have been linked to the ImageNet dataset providing free and publicly available access to image data.Footnote 23
It is, however, much less clear who actually benefits from ‘public’ data becoming available as ‘open’ data. Open data might be beneficial for a wide range of reasons,Footnote 24 but it is not an effective way to counterbalance the pervasive data control asymmetries in the global digital economy. To the contrary, one might suspect that those with the capacity to collect open data and to correlate it with the ‘closed data’ under their (often infrastructural) control stand to gain more than those who lack such capabilities and have to rely on open data entirely. This also has geopolitical implications as those operating out of relatively closed digital economies – such as China – are able to capture open data elsewhere in addition to the data they collect domestically without much external competition.Footnote 25
In certain cases, the local relevance of a certain dataset (for example, traffic data in Taipei) might indicate heightened relevance for a local community, which might incentivize local initiatives to use such local data for local development. But the frequency and salience of such a dynamic, while plausible, needs to be empirically established. It is equally possible that non-local actors will use local data to train algorithms for deployment locally, or indeed elsewhere. Opening up governmental data may benefit AI development, but the local or domestic development of an AI economy is highly contingent on other factors, such as research capacity, data processing ability, and so forth.
Against this backdrop, it is worth noting that the question of whether more privately held data should be made available to governments, businesses, or citizens seems comparatively underexplored.Footnote 26 Private entities are willing to share certain datasets for research purposes, but the legal technology used for such data transfers is usually contracting, not open data licenses.Footnote 27 Data contracting allows for more legal control over the conditions under which data is being shared, used, and distributed.Footnote 28 If governments wanted to make private data available, they could facilitate private–public data sharing by providing more legal certainty (for example, through model contracts, especially with a view toward mitigating liability risks) or by requiring the openness of data generated with public support (analogous to open access publishing requirements),Footnote 29 if not requiring mandatory data sharing outright, as explored further below.Footnote 30
B Data Transfer Restrictions
Several jurisdictions impose data transfer restrictions to secure jurisdictional control over certain categories of data.Footnote 31 The EU’s General Data Protection Regulation (GDPR)Footnote 32 is routinely accused by US actors as a ‘protectionist’ instrument, designed to favor the European digital economy, albeit with questionable results.Footnote 33 This critique often alleges that the GDPR’s intended purpose of protecting European data subjects’ personal data and privacy and its underlying fundamental rights justification are false pretenses for protectionist digital industrial policy.Footnote 34 Drawing a contrast between data protection and data protectionism tacitly assumes that the economic theories in support of trade in goods and services also apply to data, despite its different and arguably unique characteristics.Footnote 35 The relationship between data protection and privacy on the one hand and data-driven innovation and economic growth on the other is more complicated than the protection/protectionism binary suggests.Footnote 36 The GDPR’s predecessor – the European Data Protection Directive (DPD) – was in part motivated by concerns that disparate data protection regimes across the European single market would stymie the nascent European Internet economy.Footnote 37 Much less attention was paid, however, to the question of how European data protection law would affect the conditions under which the European digital economy operates in comparison to the rest of the world. The DPD’s restriction on transfers of personal data from the EU to third countries was not designed as an instrument of economic policy but was meant to ensure that personal data would remain protected even if transferred outside the EU’s territory.Footnote 38 These features contributed to the ‘Brussels Effect’ and the global diffusion of EU-style data protection through law.Footnote 39 The EU’s new data strategy, announced with great fanfare in February 2020, conceives of data as an economic resource and seeks to reframe the GDPR as sound economic policy domestically (ensuring consumer trust in the digital economy) and globally (supposedly giving the European digital economy a competitive edge because of the EU’s role as global data regulator), without mentioning the restriction on extra-EU transfers of personal data explicitly.Footnote 40
In contrast, India has come forward with a draft ‘e-commerce policy’ that openly advocates for data transfer restrictions for reasons of economic policy rather than data protection concerns, whether genuine or not. The policy document – which, of course, still needs to be converted into operational law – laments the absence of a legal framework that would allow the Indian government to impose restrictions on the export of valuable data:
Without having access to the huge trove of data that would be generated within India, the possibility of Indian business entities creating high value digital products would be almost nil. … Further, by not imposing restrictions on cross-border data flow, India would itself be shutting the doors for creation of high-value digital products in the country.Footnote 41
This is a remarkable departure from a key tenet of the Silicon Valley Consensus according to which the uninhibited “free flow” of data is the best way to develop a digital economy. Whatever one’s initial view of this policy proposal, it deserves careful legal and economic analysis, because it asks important and underexplored questions: if data is the key resource of the digital economy, especially for AI development, how to facilitate optimal allocation of this resource? Who captures its value? And how can those who do not immediately benefit from the digital transformation be supported, and by whom?
The Indian proposal assumes a strong role for the government in mediating the transition of India toward a digital economy, but this is by no means the only institutional solution imaginable. Moreover, in light of India’s proposal to limit the transfer of data from India to ensure access to data for the domestic AI economy, one may wonder whether it might be more beneficial to incentivize the transfer of relevant data to India. Such ideas challenge the Silicon Valley Consensus, which holds that optimal data allocation is to be achieved through market mechanisms only – despite the digital economy’s pervasive data control asymmetries and resulting market failures.Footnote 42
C Mandatory Data Sharing
Digitalization changes the conditions under which capitalism operates.Footnote 43 Companies with superior data collection capacities benefit as they exploit the resulting information asymmetries.Footnote 44 E-commerce platforms may be able to leverage their intermediary position to gather information about commercial transactions on either side of the two-sided market they facilitate. Relying on predictive algorithms, they may be able to engineer demand through targeted advertising. The price to be paid may no longer be uniform – determined by aggregate supply and demand – but is ‘personalized’ (i.e., discriminatory).Footnote 45 Legally mandated data sharing has been proposed as a policy intervention to counterbalance the digital economy’s tendency to create winner-takes-all dynamics and to ensure a competitive environment conducive to innovation.Footnote 46 But alternative justifications for mandatory data sharing are plausible, including data redistribution.
The EU and Australia are among the jurisdictions that have experimented with certain forms of mandatory data sharing. The EU’s GDPR contains a right to data portability that requires data controllers to transmit personal data in a structured, commonly used, and machine-readable format to another data controller, at the request of the data subject.Footnote 47 The provision is supposed to enhance data protection by creating a more competitive environment (on the assumption that consumers will gravitate toward firms with higher data protection standards), but its impact has been muted.Footnote 48 In contrast, Australia’s Consumer Data Right (CDR) bill was not primarily designed as a data protection law. It provides for the sharing of consumption data with consumers and accredited third parties, subject to data privacy safeguards, in certain sectors.Footnote 49
The discussion around mandatory data sharing is most advanced in the banking sector. The EU’s second payment services directive requires banks to share consumers’ payment account data with third-party providers (under the condition that the consumers explicitly consented to such transfers).Footnote 50 The goal is to advance competition between traditional banks and newly emerging financial services providers, some of which rely heavily on algorithmic analysis of financial data. Banks seem to have acquiesced to these new regulatory demands by creating dedicated data transfer infrastructures in the form of web-based application programming interfaces (APIs).Footnote 51 Automotive vehicle data is another data category that is increasingly subject to mandatory data-sharing requirements. In some jurisdictions, car manufacturers must make vehicle data available to independent repair shops.Footnote 52 The EU’s European data strategy contemplates further interventions in a variety of sectors, including agricultural, industrial, and health data, where other arrangements prove insufficient to facilitate data sharing.Footnote 53 The salience of data for AI development seems likely to spur further such initiatives elsewhere. As the next section explores, data holders will seek to mobilize existing and emerging commitments under IEL to oppose mandatory data sharing and data mobility restrictions.
III Regulation of Data Mobility and Control under International Economic Law
IEL regulates data along at least two dimensions that are somewhat in tension with each other: data mobility (where does data reside and where can it move?) and data control (who has data and who decides how it can be used?). While new rules on free flows and data localization regulate data in favor of transnational data mobility, existing IEL, especially international IP and investment law, entrenches private control over data by limiting states’ ability to mandate data disclosure and sharing. This chapter’s focus on substantive disciplines regarding data mobility and data control is not to downplay the extent to which contemporary IEL leads to deep transformations of the regulatory state by introducing a wide range of horizontal and sectoral procedural requirements, which may be especially salient if new regulation is being considered in a not yet or under-regulated domain.Footnote 54 Indeed, it is precisely through these procedural mechanisms that those who control data will seek to mobilize IEL to their advantage transnationally.Footnote 55 IEL is routinely invoked by lawyers representing firms, trade associations, regulatory agencies, and other actors in opposition to or support of their clients’ preferred policy outcome. In this way, domestic law is to a significant extent continuously being shaped and reshaped by IEL.Footnote 56
A Regulation of Data Mobility
Several disciplines in international trade law regulate data mobility in favor of cross-border transfers of data, at the expense of nation states’ ability to restrict such transfers or to require the location of computing facilities (such as routers, servers, or data centers) within their territory. While established disciplines under the rules for trade in goods and trade in services in general, and telecommunication services in particular, only apply to certain categories of data, the new disciplines in “e-commerce” and “digital trade” chapters of agreements like CPTPP or USMCA apply to ‘information’, including personal information, generally.Footnote 57 Under the ‘digital trade’ framing, certain cross-border transfers of data can be conceptualized as trade in digital goods or as trade in digital services. To accommodate nonphysical goods, dedicated provisions address ‘digital products’Footnote 58 that enjoy protections from discriminatory treatment.Footnote 59 However, data that is not produced for commercial sale or distribution but that is generated or assembled for machine learning purposes apparently escapes the digital product category. Similarly, if data is used to train algorithms that provide services (for example, financial services based on algorithms trained with financial market data), only the services, but not the data used to provide the services, enjoy the protections under the General Agreement on Trade in Services (GATS) and the equivalent provisions in free trade agreements. GATS commitments apply if data is an end (data as a service) and not just a means to an end, and only if the WTO member in question has made specific commitments toward services liberalization in its schedule. Relevant categories in this regard encompass data processing services, software programming services, and various kinds of telecommunication services.Footnote 60
Under the contested principle of technology neutrality, established commitments for services – formerly provided in analog form but now increasingly provided digitally – automatically acquire the same liberalization status as their analog counterparts.Footnote 61 In this way, the gradual digitalization of services can lead to a gradual liberalization of services economies that registered relatively liberal commitments for analog services. Conversely, some digital services escape the WTO’s classification of services altogether, thereby creating new gaps within the system. It was, for example, unclear under which category Google’s core business – providing search services – could be subsumed before the revised classification included a dedicated category for ‘web search portal content’.Footnote 62
If a WTO member has made specific commitments to allow for cross-border market access of digital foreign service providers, full-scale data transfer limitations that amount to a ‘total prohibition’ of the relevant service are principally illegal under Article XVI:2 (c) GATS (zero quotas).Footnote 63 Data transfer limitations that fall short of a ‘total prohibition’, as is the case under both the EU and the proposed Indian model, are not affected by this prohibition. They would need to comply, however, with the general obligation to national (that is, nondiscriminatory) treatment contained in Article XVII GATS and the requirement to administer any such limitation in a reasonable, objective, and impartial manner under Article VI GATS. The former would not apply to a situation in which both domestic and foreign service suppliers would need to comply with the data transfer limitations in question. The latter may give rise to a violation if the GATS member can show that the EU, for instance, conducted its adequacy assessment in an unreasonable, subjective, or partial manner. In this way, the GATS metaregulates the regime for personal data transfers under the EU’s GDPR. While the EU is principally allowed to adopt and enforce measures to protect the privacy of individuals in relation to the processing and dissemination of personal data and the protection of confidentiality of individual records and accounts, it must not do so in a manner that would constitute an arbitrary or unjustifiable discrimination between comparable countries or a disguised restriction on trade in services.Footnote 64 In contrast, no such general exception exists for the Indian proposal to limit the transfer of Indian data for overtly protectionist purposes. This is likely inconsequential, because India made only minimal commitments toward services liberalization, but nevertheless paradigmatic for international trade law’s aversion against ‘protectionism’ that is being carried forward in the digital domain.
Contrast the multilateral rules for trade in services under GATS – which are contingent on services classification, dependent on specific commitments by states, and not tailored toward questions of data mobility – with the newly created rules in agreements such as CPTPP, USMCA, and USJDTA that are specifically designed to protect data mobility against transnational data transfer restrictions.
These rules contain commitments to refrain from prohibiting or restricting the cross-border transfer of information, unless such measures are necessary to achieve a public policy objective and are not arbitrary, unjustifiably discriminatory, a trade restriction in disguise, or more restrictive than necessary.Footnote 65 The last clause, the trade law version of a necessity test, in particular, is reason enough for the EU to oppose these kinds of provisions in plurilateral (as in the case of the failed Trade in Services Agreement (TISA)) and bilateral negotiations (as in the case of the cratered EU-US Transatlantic Trade and Investment Partnership (TTIP)). While data and privacy protection are universally recognized as legitimate public policy objectives, at least in principle, views about what is necessary to achieve these objectives differ considerably. Accordingly, the EU carves out its data protection regime, including the data transfer restrictions, from external scrutiny in its trade agreements.Footnote 66
The model inaugurated in TPP and subsequently used in USMCA and USJDTA also created a dedicated rule on a certain form of data localization that requires foreign businesses to use or locate computing facilities within a treaty party’s territory as a condition for conducting business in that territory.Footnote 67 In contrast to the TPP, which allowed for the possibility to justify such measures in principle under the same conditions as applicable to cross-border data transfer restrictions, the USMCA and USJDTA do not preserve this option.Footnote 68 They also ‘fix’ the ‘gap’ that the TPP had created for financial data at the insistence of US financial regulators and to the disappointment of US financial services providers. While still treating financial services data differently from other information, the USA, Mexico, Canada, and Japan, respectively, agreed to refrain from mandating the use of domestic computing facilities requirements for financial services, as long as their respective financial regulatory authorities have immediate, direct, complete, and ongoing access to information processed or stored on financial services computing facilities outside their territory.Footnote 69 In this way, the USMCA and USJDTA preserve both the right of financial service providers to locate data territorially where they see fit and the right of regulators to access the data transnationally.
In sum, established rules in the multilateral trading system only protect certain kinds of data from certain kinds of restrictions. In this sense, factual data mobility – that is, the ability of data holders to decide where data resides and where it moves – exceeds the legal protection of data mobility under WTO law. For this reason, the USA and like-minded countries have been advocating for more stringent rules to preserve transnational data mobility as other countries have sought to impose data transfer restrictions.Footnote 70 The design of these provisions, in particular their reliance on categories borrowed from international investment law conducive to regulatory arbitrage by way of strategic incorporation, means that countries that sign on to the US model effectively opt for an open digital economy favoring transnational data mobility vis-à-vis everyone. The EU, and other jurisdictions interested in a more differentiated regime, are hence prudent in refraining from such commitments.Footnote 71
B Regulation of Data Control
IEL regulates control over data mainly through commitments under international IP law and international investment law. International IP law – which shifted into the trade regime with the WTO’s agreement on ‘trade-related aspects of intellectual property rights’ (TRIPS) and has since become a staple of ‘free trade’ agreements – regulates control over data by requiring IP protection for certain categories of data. Recent US agreements have gone further by creating new rights to data exclusivity in their IP chapters and novel protections for algorithms in ‘digital trade’ chapters. Yet, the entrenchment of data control under international investment law might be even more far-reaching as it lends itself to protecting data as an asset (investment), which entitles data holders (investors) to certain guarantees enforceable against nation states by way of investor–state dispute settlement (ISDS). While ostensibly in favor of data mobility, IEL tends to entrench data control by protecting those who have data rather than those who need it or want it. The only exception are new commitments in recent agreements that encourage governments to make ‘their’ data available as ‘open data’.Footnote 72 This encourages a shift from governmental control over data toward ‘public’ access, which is, in reality, often mediated by private actors such as data brokers or cloud providers.Footnote 73 No international agreement contemplates data sharing by private data holders, despite the regulatory trend toward compulsory data-sharing mechanisms in certain jurisdictions.
IEL’s regulation of data control is especially salient as the question of legal ownership over data remains unsettled in domestic law.Footnote 74 The integration of international IP law into IEL has led to the gradual transformation of IP as a coordinative system of incentive governance into a commodity that can be ‘traded’ transnationally and an asset that enjoys investment protection.Footnote 75 While the reconceptualization of established IP rights as investments might upset the balance found under TRIPS,Footnote 76 the dynamic might be different for data where such a balance is yet to be found. Both common and civil law systems grapple with questions of whether and to what extent property rights in data should be recognized, newly established, or – where they exist – abolished. IEL may have a significant and potentially long-lasting influence on these debates. In this context, it is important to differentiate between legal rights of data ownership (property rights in data) and factual control over data. Data holders may exercise infrastructural control over data without commensurate property rights that a domestic court would recognize or enforce. Conversely, data transfer, storage, and processing infrastructures can be designed in ways that separate forms of legal or technological control over data. One example is cloud computing models in which the owner and operator of the physical and digital data infrastructure has no access to its consumer’s data.Footnote 77 Another is ‘safe sharing sites’, which provide for differentiated access to data, while distinguishing between raw data and insights derived from them.Footnote 78 Neither of these contractual arrangements hinges on the recognition of property rights in data.
However, legal ownership claims over data can be critical when de facto control over data is being challenged. When governmental regulators require the disclosure of information or when data-sharing requirements between businesses are being instituted, data controllers will claim ‘data ownership’ to guard their economic interests in data exploitation. Such claims under domestic law can be shaped and entrenched by commitments under IEL.
The TRIPS agreement sets a baseline for IP protection for certain categories of data, but such protection is not comprehensive and remains contested. Copyright, for example, only covers expressions (such as images, texts, videos) as data.Footnote 79 Compilations of data can be protected if they constitute intellectual creations, but such protection does not extend to the data contained therein.Footnote 80 Trade secrets might be able to fill some of these gaps. Technological shifts toward cloud computing and ML make it easier to satisfy the three-pronged test that Article 39.1 TRIPS stipulates. First, the secrecy of data can be achieved, for example, by keeping the data internal and by only allowing differentiated access. Second, the commercial value derived from secrecy may flow from competitive advantages in machine learning applications attributable to superior datasets. And third, secrecy can be maintained by way of technological safeguards such as encryption.Footnote 81 While the extent to which trade secrecy under TRIPS protects against data disclosure requirements transnationally has not yet been tested in dispute settlement proceedings,Footnote 82 companies rely routinely on trade secrecy to fight transparency domestically.Footnote 83 In light of uncertainty about the level of protection of undisclosed test data provided by Article 39.3 TRIPS, the USA has been aggressively pushing for ‘data exclusivity’ provisions in recent agreements.Footnote 84 While so far confined to regulatory approval for agricultural chemical and pharmaceutical products – where data exclusivity creates de facto exclusivity for the relevant product – these demands might be a precursor for future contests around data exclusivity in other contexts. Novel provisions protecting against source code disclosure that go beyond the traditional copyright protection for software are another pointer in the same direction.Footnote 85
Meanwhile, international investment law’s bearing on data control has been largely overlooked, but this might just be the calm before the storm.Footnote 86 The broad ‘investment’ definitions found in many agreements and the variety of approaches deployed by tribunals make it plausible that ‘data’ will soon be recognized as a protected asset under international investment law by at least some tribunals,Footnote 87 thereby granting property-type protection under international law where such protection under domestic law remains uncertain.Footnote 88 While the broad and relatively open-ended guarantee of fair and equitable treatment contained in many investment agreements can be leveraged against many forms of data regulation, the guarantees against indirect or even direct expropriation appear to be particularly apt to challenge the growing trend toward mandatory data sharing. To be sure, in the absence of ISDS jurisprudence, many open questions remain: does the recognition of data as an asset presuppose the recognition of IP-type rights in data (fostering convergence between international IP and investment law)?Footnote 89 Is the collection of data making a contribution to the host state economy, as required under the Salini test?Footnote 90 What kind of territorial nexus, if any, is required between a company’s data-related activities and the host state to enjoy investment protection?Footnote 91 Answers to these question will only emerge over time. The development of ISDS jurisprudence on data control questions is likely to depend on what kind of cases are being brought against whom and on what basis. The failed attempt to challenge Australia’s tobacco regulation may cause investors to tread more carefully when challenging the regulatory ambitions of developed countries (e.g., the EU’s data strategy).Footnote 92 Developing countries with industrial data policies that challenge the Silicon Valley Consensus are likely targets for ISDS-backed counter pressure.
IV Adapting International Economic Law for the Artificial Intelligence Economy
The picture that emerges is one in which new commitments toward data mobility under IEL enable those who have data to decide where they want to store, process, and transfer data, while international IP and investment law guard against governmentally mandated transparency about and/or re-distribution of control over data. Protections of mobility and control of capital are, of course, familiar ways in which IEL has facilitated global capitalism. Yet, data differs from other means of production and might necessitate changes to the global regulatory environment to generate societally beneficial outcomes. Developing countries appear to be in a particularly precarious position. Embracing the shift toward a data-driven economy is widely seen as the best path toward development.Footnote 93 Yet, charting this path while respecting local conditions and values such as human agency and self-determination is challenging because of the concentration of power over the relevant digital infrastructures and data that lends itself to new dependencies and carries the risk of data extractivism without adequate compensation.Footnote 94 For these reasons, contemporary IEL’s tendency to apply policy prescriptions of the twentieth century to the emerging AI economy in the twenty-first century needs critical evaluation and, where necessary, reconfiguration. Future work will consider the following questions and tentative propositions.
First, how can governmental interests in local access to and/or regulatory control over data be reconciled with transnational business interests in cross-border data flows? While territorial data localization requirements are by no means the only or best way to ensure local access to data, it seems premature for governments to tie their hands when viable alternatives are not yet in place. In particular, countries that are interested in maintaining a differentiated approach to transnational data flows (or at least the possibility to institute such a regime eventually) may want to avoid the sweeping provisions that the CPTPP, USMCA, and JUSDTA have pioneered. Instead, imposing conditionalities under IEL directly on multi-national digital corporations – trading protections of free flow of data against commitments toward regulatory commands – might be a superior regulatory approach.Footnote 95
Second, what are the implications of the fundamental differences between financial capital and data-as-capital for international investment law? As international investment law is undergoing critical re-evaluation and at least partial reform in both substance and procedure, its implications for an AI economy in which data is treated as a resource ought to be part of the agenda. Vague references to the ‘right to regulate’ may be insufficient to enable creative experimentation with digital economy policies without risk of ‘regulatory chill’. As an ISDS moratorium for COVID-19-related measures is being considered, a comparable moratorium for certain digital economy policies should be on the table as well.
Third, is there a need to recalibrate the temporal mismatch between long-lasting obligations under IEL and the rapid pace of technological development? IEL’s traditional commitment to providing ‘certainty’ for transnational business activity seems at odds with the rapid pace of innovation in the digital economy. The principle of technology neutrality may need to be cabined when new technologies transform the economy fundamentally.
And finally, how can IEL help to confront (rather than exacerbate) the pervasive data control asymmetries in the digital economy? A first step in this direction might lie in addressing the uncertainty about the value of data and data flows in a globalized digital economy. Existing proxies for the value of data flows (e.g., bandwidth expansion) and of data control (e.g., market capitalization) seem insufficient to inform policy-makers and treaty drafters. While the Organisation for Economic Co-operation and Development (OECD) and the WTO have gradually begun to address this challenge, their efforts so far have failed to consider proactive measures through which the data amassed by global platform companies could be leveraged to (re)assess the state of the global digital economy. As it turns out, data is a resource not just for the AI economy but also for the future development and reconfiguration of IEL.
I Introduction
Europeans have only recently realized their weaknesses and the risk of remaining at the margins of the fourth industrial revolutionFootnote 1 artificial intelligence (AI) is expected to bring about. Despite the existence of the single market, Europe industrial policy, including policy in the field of AI, still suffers from a lack of coordination and frequent duplications between member states. Moreover, investments in AI research and innovation remain limited when compared with Asia and North America.Footnote 2 As a result, European companies are in a weak position in terms of consumer application and online platforms, and industries are suffering from a structural disadvantage in the areas of data access, data processing and cloud-based infrastructures still essential for AI.
However, this gloomy overview calls for some nuance. The European Union (EU) and its member states are still well placed in the AI technological race, and the European economy benefits from several important assets, remaining not only an AI user but also, more critically, an AI producer. EuropeFootnote 3 is still a key player in terms of research centers and innovative start-ups and is in a leading position in sectors such as robotics, service sectors, automotive, healthcare and computing infrastructure. Perhaps more importantly, there is growing awareness in Europe that competition and the technological race for AI will be a matter of great significance for the future of the old continent’s economy, its recovery after the COVID-19 pandemic and, broadly speaking, the strategic autonomy of the EU and its member states.
The 2020 European Commission White Paper on Artificial Intelligence illustrates a form of European awakening.Footnote 4 This strategic document insists on the necessity of better supporting AI research and innovation in order to strengthen European competitiveness. According to the Commission, Europe should particularly seize the opportunity of the “next data wave” to better position itself in the data-agile economy and become a world leader in AI.Footnote 5 The Commission makes a plea for a balanced combination of the economic dimension of AI and a values-based approach as the development of AI-related technologies and applications raises new ethical and legal questions.Footnote 6
ProfilingFootnote 7 and automated decision-makingFootnote 8 are used in a wide range of sectors, including advertising, marketing, banking, finance, insurance and healthcare. Those processes are increasingly based on AI-related technologies, and the capabilities of big data analytics and machine learning.Footnote 9 They have enormous economic potential. However, services such as books, video games, music or newsfeeds might reduce consumer choice and produce inaccurate predictions.Footnote 10 An even more serious criticism is that they also can perpetuate stereotypes and discrimination bias.Footnote 11 Studies on this crucial issue are still rare because of a lack of access, as researchers often cannot access the proprietary algorithm.Footnote 12 In several European countries, including France, the opacity of algorithms used by the administration has become a political issue and has also provoked growing case lawFootnote 13 and legislative changes.Footnote 14 Finally, as the European Commission recently observed, AI increases the possibility to track and analyze people’s habits. For example, there is the potential risk that AI may be used for mass state surveillance and also by employers to observe how their employees behave. By analyzing large amounts of data and identifying links among them, AI may also be used to retrace and deanonymize data about persons, creating new personal data protection risks.Footnote 15
To summarize, the official European stance regarding AI combines a regulatory and an investment-oriented approach, with a twin objective of promoting AI and addressing the possible risks associated with this disruptive technology. This is indeed crucial as the public acceptance of AI in Europe is reliant on the conviction that it may benefit not only companies and decision-makers but also society as a whole. However, so far, especially when it comes to the data economy on which AI is largely based, public intervention in Europe has occurred through laws and regulations that are based on noneconomic considerations. The General Data Protection Regulation (GDPR)Footnote 16 is essential in this respect because it reflects how a human rights-based legal instrument might interfere with data-based economic principles. This 2016 regulation aims at enforcing a high standard of personal data protection that can limit the free flow of data, which is at the heart of the development of AI technologies.
Given the worldwide economic importance of the single market, the effects of this regulation are inevitably global. Many commentators rightly emphasized the extraterritorial effect of this European regulation, as a non-European company wishing to have access to the European market has no choice but to comply with the GDPR.Footnote 17 Moreover, the most recent generation of EU free trade agreements (FTAs) contains chapters on e-commerce and digital trade, under which the parties reaffirm the right to regulate domestically in order to achieve legitimate policy objectives, such as “public morals, social or consumer protection, [and] privacy and data protection”. Under the latest EU proposals, the parties would recognize cross-border data flows, but they would also be able to “adopt and maintain the safeguards [they] deem appropriate to ensure the protection of personal data and privacy, including through the adoption and application of rules for the cross-border transfer of personal data”.Footnote 18
The next section will present the growing debate on data protectionism (Section II). I will then study the EU’s approach toward data protection and assess whether the set of internal and international legal provisions promoted by the EU effectively translates into a meaningful balance between trade, innovation and ethical values (Section III). I will also describe the birth of European trade diplomacy in the field of digital trade, focusing the analysis on the most recent EU FTAs’ provisions and proposals. I will compare them with recent US-led trade agreements, such as the Trans-Pacific Partnership (TPP) and the United States-Mexico-Canada Agreement (USMCA), to assess whether the EU’s approach constitutes a model for future plurilateral or multilateral trade agreements (Section IV). In conclusion, I will assess whether the American and European approaches are reconcilable or destined to diverge given the opposing political and economic interests they translate.
II Data Protection or Data Protectionism?
Data has often been described as a contemporary raw material, a sort of postindustrial oil, and its free flow as the necessary condition for the convergence between globalization and digitalization. Data is at the heart of the functioning of AI, which is in turn the most important application of a data economy. The development of AI relies on the availability of data, and its value increases with detailed and precise information, including private information.Footnote 19 The availability and enhancement of data are crucial for the development of technologies, such as machine learning and deep learning, and offer a decisive competitive edge to companies involved in the global competition for AI.Footnote 20 Moreover, access to data is an absolute necessity for the emergence and development of a national and autonomous AI industry.Footnote 21 Not surprisingly, given the growing economic and political importance of data, governments and policy-makers are increasingly trying to assert control over global data flows. This makes sense as data, and in particular private data, is more and more presented as a highly political issue that has for too long been ignored in the public debate.Footnote 22
The current move toward digital globalization could be threatened by three types of policies: new protectionist barriers, divergent standards surrounding data privacy and requirements on data localization.Footnote 23 Data localization has also been depicted as “data protectionism” and a new form of nationalism,Footnote 24 or even anti-Americanism,Footnote 25 whereas others have advocated for a “digital sovereignty” that would imply the state’s power to regulate, limit or even prohibit the free flow of data.Footnote 26 Many countries are indeed subject to internal tensions between supporters of data openness as a catalyst for trade and technological development and those who promote comprehensive data protection in order to defend digital sovereignty as a prerequisite of national sovereignty. Old concepts and notions of international law, such as (digital) self-determination, (data) colonization, reterritorialization of data and (digital) emancipation, are also mobilized when it comes to justifying states’ “right to regulate” data. However, those general concepts often appear inadequate given the intrinsic nature of data flows and Internet protocol, which tend to blur the distinction between the global and the local. Data flows somehow render obsolete the traditional considerations of geographical boundaries and cross-border control that characterize classical international law.Footnote 27
Neha Mishra has thoroughly described different types of data-restrictive measures. State control can intervene using the physical infrastructures through which Internet traffic is exchanged, a local routing requirement and a variety of cross-border data flow restrictions, such as data localization measures or conditional restrictions imposed on the recipient country or the controller/processor.Footnote 28 Primary policy goals may justify those restrictions on the grounds of public order and moral or cultural issues. In Europe, the rationale behind the restrictions on the cross-border of data transfer and AI has been primarily addressed through the angle of data protection – that is, the defense and protection of privacy – as one of the most fundamental human rights.
This narrative extends well beyond the sole economic protection of European interests and has the enormous advantage of conciliating protectionist and nonprotectionist voices in Europe. It contrasts and conflicts with an American narrative based on freedom and technological progress, where free data flows are a prerequisite for an open and nondiscriminatory digitalized economy.
III The European Legal Data Ecosystem and Its Impacts on Artificial Intelligence and International Data Flows
The European Legal Framework on data, and in particular on data protection, is nothing new in the EU. It can be explained in the first place by internal European factors. European member states started to adopt their own law on the protection of personal information decades ago,Footnote 29 on the grounds of the protection of fundamental rights, and in particular the right to privacy, protected under their national Constitution, the European Convention on Human Rights and the European Charter of Human Rights, which forms part of current primary EU law. Therefore, EU institutions recognized early the need to harmonize their legislation in order to combine the unity of the single market and human rights considerations already reflected in member states’ legislation. It explains why, while some international standards, namely those of the Organisation for Economic Co-operation and Development (OECD)Footnote 30 and Asia-Pacific Economic Cooperation (APEC),Footnote 31 emphasize the economic component of personal data, the EU’s legal protection has been adopted and developed under a human rights-based approach toward personal data.Footnote 32
The 1995 European Directive was the first attempt to harmonize the protection of fundamental rights and freedoms of natural persons with respect to processing activities, and to ensure the free flow of data between member states.Footnote 33 However, a growing risk of fragmentation in the implementation of data protection across the EU and legal uncertainty justified the adoption of a new instrument that took the form of a Regulation, which is supposed to provide stronger uniformity in terms of application within the twenty-seven member states.Footnote 34
The GDPR also represents a regulatory response to a geopolitical challenge initiated by the United States and its digital economies to the rest of the world. From a political perspective, the Snowden case and the revelation of the massive surveillance organized by American agencies provoked a strong reaction among European public opinion, including within countries that had recently experimented with authoritarian regimes (such as the former East Germany and Poland).Footnote 35 The Facebook-Cambridge Analytica scandal further demonstrated that the freedom of millions of Europeans and their democracies was at stake and could be threatened by the digital hegemony of American tech companies with commercial interests. The demand for data protection against free and uncontrolled flows of data has also been encouraged by the progressive awareness of the economic and technological consequences of free data flows, as European companies appeared to be increasingly outpaced by their American rivals, especially in the field of AI. In parallel, in a spectacular ruling in 2015, the European Court of Justice annulled a decision of the European Commission, under which the United States was until then considered to be providing a sufficient level of protection for personal data transferred to US territory (under the so-called safe harbor agreement).Footnote 36
The GDPR has been both praised and criticized, within and outside of Europe. Still, it remains to a certain extent a legal revolution in the field of data regulation, not so much because of its content – it is not, after all, the first legal framework to deal with algorithms and data processing – but more because of the political message this legislation sends to the European public and the rest of the world.Footnote 37 Through the adoption of this Regulation in 2016, the EU has chosen to promote high standards for data protection. Every single European and non-European company that is willing to process European data, including those developing AI, must comply with the GDPR.Footnote 38
A European Data Protection’s Regulation and Artificial Intelligence
The GDPR regulates the processing of personal data; that is, any information relating to a directly or indirectly identified or identifiable natural person (“data subjects”). This legislation deals with AI on many levels.Footnote 39 First, it contains a very broad definition of “processing” as “any operation or set of operations which is performed on personal data or on sets of personal data, whether or not by automated means”.Footnote 40
It also regulates the conditions under which “personal data”Footnote 41 can be collected, retained, processed and used by AI. The GDPR is built around the concept of lawful processing of data,Footnote 42 meaning that personal data cannot be processed without obtaining individual consent or without entering into a set of limited categories defined under the Regulation.Footnote 43 That is a crucial difference between current American federal and state laws, which are based on the presumption that data processing is lawful unless it is explicitly prohibited by the authorities under specific legislation.Footnote 44
Under the GDPR, processing of personal data is subject to the lawfulness, fairness and transparency principles.Footnote 45 The Regulation also contains specific transparency requirements surrounding the use of automated decision-making, namely the obligation to inform about the existence of such decisions, and to provide meaningful information and explain its significance and the envisaged consequences of the processing to individuals.Footnote 46 The right to obtain information also covers the rationale of the algorithms, therefore limiting their opacity.Footnote 47 Individuals have the right to object to automated individual decision-making, including the use of data for marketing purposes.Footnote 48 The data subject has the right to not be subject to a decision based solely on automated decision-making when it produces legal effects that can significantly affect individuals.Footnote 49 Consent to the transfer of data is also carefully and strictly defined by the Regulation, which states that it should be given by a clear affirmative act from the natural person and establishes the principles of responsibility and liability of the controller and the processor for any processing of personal data.Footnote 50 Stringent forms of consent are required under certain specific circumstances, such as automated decision-making, where explicit consent is needed.Footnote 51
Therefore, under the GDPR, a controller that will use data collected for profiling one of its clients and identifying its behavior (for instance, in the sector of insurance) must ensure that this type of processing relies on a lawful basis. Moreover, the controller must provide the data subject with information about the data collected. Finally, the data subject may object to the legitimacy of the profiling.
Another illustration of the interference between AI technologies and GDPR is the requirements and limitations imposed on the use of biometric dataFootnote 52 for remote identification, for instance through facial recognition. The GDPR prohibits the process of biometric data “for the purpose of uniquely identifying a natural person” unless the data subject has given explicit consent.Footnote 53 Other limitations to this prohibition are exhaustively delineated, such as the “protection of the vital interests” of the data subject or other natural persons, or for reasons of “substantial public interest”. Most of those limited biometric identification purposes will have to be fulfilled according to a necessity and a proportionality test and are subject to judicial law review.Footnote 54
B Transatlantic Regulatory Competition
Despite its limitations and imperfections, the GDPR remains as a piece of legislation that aims to rightfully balance fundamental rights considerations with technological, economic and policy considerations in accordance with European values and standards. In contrast, US law surrounding the data privacy legal framework does not rely on human rights but, rather, on consumer protection, where the individual is supposed to benefit from a bargain with the business in exchange for its personal information (the so-called transactional approach).Footnote 55 Moreover, in contrast with Europe’s unified and largely centralized legislation, the American model for data protection has primarily been based on autoregulation and a sectoral regulation approach, at least until the 2018 adoption of the California Consumer Privacy Act (CCPA).Footnote 56
This state legislation partially resembles the GDPR. First, the CCPA is the first data protection statute that is not narrowly sectoral.Footnote 57 It defines “personal information” in a way that seems in practice equivalent to the GDPR’s personal data definition.Footnote 58 Personal information is also partially relevant to AI (such as biometric data, geolocalization and Internet, or other electronic network information). It also includes a broad definition of processing, which can include automated decision-making.Footnote 59 Echoing the GDPR’s transparency requirements, the CCPA provides a right of information, under which a consumer has the right to request that a business that collects consumers’ personal information disclose to that consumer the categories and specific pieces of personal information collected.Footnote 60 This right of disclosure is particularly significant.Footnote 61 The CCPA also contains a right to opt out and deny the possibility for a business to use its personal information.Footnote 62
Despite those similarities, important differences remain between the two statutes. Concretely, under the CCPA’s transactional approach, the right to opt out cannot be opposed if it is necessary to business or service providers to complete the transaction for which the personal information was collected or to enable solely internal uses that are reasonably aligned with the expectations of the consumer’s relationship with the business.Footnote 63 Moreover, whereas the GDPR rests on the principle of the “lawful processing of data”,Footnote 64 the CCPA does not require processing to be lawful, implying that data collection, use and disclosure is allowed unless it is explicitly forbidden. Whereas the GDPR requires some specific forms of consent related to sensitive data and limits individual automated decision-making, the CCPA “does nothing to enable individuals to refuse to give companies their data in the first place”.Footnote 65 Another striking difference is related to the consumer’s right not to be discriminated against under the CCPA if he or she decides to exercise the right to seek information or the right to opt out. The effect of this nondiscrimination principle seems tenuous as, in those circumstances, a business is not prohibited from charging a consumer a different price or rate, or from providing a different level or quality of goods or services to the consumer.Footnote 66 This is typically the result of a consumer protection-based approach, which in reality tolerates and admits discrimination (here, the price or the quality of the service provided), and a human rights-based approach that is much more reluctant to admit economic differentiations among individuals to whom those fundamental rights are addressed.
This brief comparison between the GDPR and the CCPA is not meant to suggest that one legislative model is intrinsically superior, more efficient, more legitimate or more progressive than the other. Both statutes merely translate ontological discrepancies between the European and American legal conceptions and policy choices. However, the conflict between those two models is inevitable when considering the current state of cross-border data flows. Not surprisingly, the question of extraterritoriality was crucial during the GDPR’s drafting.Footnote 67 Even though the Regulation is based on the necessity of establishing a single digital market, under which data protection and fundamental EU rights are equally guaranteed, its extraterritorial effects are expressly recognized as the GDPR applies “in the context of the activities of an establishment of a controller or a processor in the Union, regardless of whether the processing takes place in the Union or not” and “to the processing of personal data subjects who are in the Union by a controller or processor not established in the Union”.Footnote 68 The extraterritorial effects of the GDPR and, more broadly, of the EU’s legal framework are undeniable given the importance of the single EU market.Footnote 69 Extraterritoriality should be understood as a kind of “effet utile” of the Regulation, as most of the data processors and controllers are currently located outside the EU’s territory. The EU’s effort would in practice be doomed if personal data protection were to be limited to the EU borders.Footnote 70
The European legislator admits that flows of personal data to and from countries outside the EU are necessary for the expansion of international trade.Footnote 71 Yet, international data transfers must not undermine the level of data protection and are consequently subject to the Regulation’s provisions. Data transfer to third countries is expressly prohibited under the GDPR unless it is expressly authorized thanks to one of the legal bases established under the Regulation.Footnote 72 The European Commission may decide under the GDPR that a third country offers an adequate level of data protection and allow transfers of personal data to that third country without the need to obtain specific authorization.Footnote 73 However, such a decision can also be revoked.Footnote 74 In the absence of an adequacy decision, the transfer may be authorized when it is accompanied by “appropriate safeguards”, which can take the form of binding corporate rulesFootnote 75 or a contract between the exporter and the importer of the data, containing standard protection clauses adopted by the European Commission.Footnote 76 Even in the absence of an adequacy decision or appropriate safeguards, data transfer to third countries is allowed under the GDPR, in particular on the consent of the data subject, and if the transfer is necessary for the performance of a contract.Footnote 77
Under the current regime, the EU Commission adopted a set of adequacy findings with select third countries, such as Japan, in February 2019.Footnote 78 The European Commission also commenced adequacy negotiations with Latin American countries (Chile and Brazil) and Asiatic countries (Korea, India, Indonesia, Taiwan), as well as the European Eastern and Southern neighborhoods, and is actively promoting the creation of national instruments similar to the GDPR.Footnote 79 Moreover, in July 2016, the European Commission found that the EU-US Privacy Shield ensures an adequate level of protection for personal data that has been transferred from the EU to organizations in the USA, demonstrating regard for, inter alia, safeguards surrounding access to the transferred data by the United States’ intelligence services.Footnote 80 More than 5,300 companies have been certified by the US Department of Commerce in charge of monitoring compliance with a set of common data privacy principles under the Privacy Shield, which is annually and publicly reviewed by the Commission.Footnote 81 The Privacy Shield seemed to demonstrate that despite profound divergence between European and American approaches to data protection, there was still room for transatlantic cooperation and mutual recognition. However, in mid-July 2020, the European Court of Justice (ECJ) concluded that the Commission’s Privacy Shield decision was invalid as it disregarded European fundamental rights.Footnote 82 As the Court recalled, the Commission must only authorize the transfer of personal data to a third country if it provides “a level of protection of fundamental rights and freedoms essentially equivalent to that guaranteed within the European”.Footnote 83 The ECJ found lacunae in judicial protections for European data subjects against several US intelligence programs.Footnote 84
The question of data transfer between the EU and UK after Brexit is one of the many hot topics that should be dealt with in a future EU/UK trade agreement, and it is a perfect example of the problematic nature of the GDPR’s application to EU third countries with closed economic ties. The October 2019 political declaration setting out the framework for the future relationship between the two parties contains a specific paragraph on digital trade that addresses the question of data protection. It says that future provisions on digital trade “should … facilitate cross-border data flows and address unjustified data localisation requirements, noting that this facilitation will not affect the Parties’ personal data protection rules”.Footnote 85 However, in June 2020, six months after Brexit, the Commission was still uncertain regarding a future UK adequacy assessment because of a lack of specific data protection commitment in the UK. Moreover, the British government indicated that it wanted to develop a separate and independent data protection policy.Footnote 86 One of the EU’s main concerns is that through bilateral agreements concluded between the UK and the USA, data belonging to EU citizens could be “siphoned off” to the United States.Footnote 87
The issue of compatibility between European privacy rules and the Chinese legal framework is also a growing matter of concern for Europeans. China applies much stricter data border control on the grounds of national security interests. For instance, the 2017 Chinese law on cybersecurity provides that companies dealing with critical infrastructures of information, such as communications services, transport, water, finances, public services energy and others, have an obligation to store their data in the Chinese territory. Such a broad definition can potentially affect all companies, depending on the will of Chinese authorities, who also have broad access to personal information content on the grounds of national security.Footnote 88 However, Chinese attitudes regarding privacy protection are not monolithic. According to Samm Sacks, “[t]here is a tug of war within China between those advocating for greater data privacy protections and those pushing for the development of fields like AI and big data, with no accompanying limits on how data is used”. This expert even describes a growing convergence between the European and Chinese approaches in data protection regimes, leading the USA to be more isolated and American companies to be more reactive.Footnote 89 However, based on the model of the recent conflict between European data privacy rules and US tech companies’ practices, emerging cases that shed new light on data protection regulatory divergence between China and the EU are inevitable.Footnote 90
Fragmentation and market barriers are emerging around requirements for privacy and data flows across borders. Can this fragmentation be limited through international trade law? What is the EU’s position on international data flows and data protection in the context of its trade policy? Can and should European trade agreements become an efficient way to promote the GDPR’s privacy approach?
IV The Birth of European Digital Trade Diplomacy
Not surprisingly, given its imprecise nature, AI is not covered as such by trade agreements, although AI technologies that combine data, algorithms and computing power can be affected by trade commitments in the field of goods and services. In this section, I will focus on the issue of the trade dimension of cross-border data flows, given its strategic relevance to AI applications. Although data cannot be assimilated to traditional goods or services, trade rules matter with regard to data in multiple ways.Footnote 91 As I have already noted, even though regulating data flows on national boundaries might seem counterintuitive and inefficient,Footnote 92 states and public authorities are tempted to regain or maintain control of data flows for many reasons, ranging from national security to data protection to economic protectionism. A trade agreement is one international public law instrument that might constitute a legal basis to promote cross-border data control or, on the contrary, the free flow of data principle.
A A Limited Multilateral Framework
Despite recent developments, digital trade rules currently remain limited, both at the multilateral and the bilateral level. World Trade Organization (WTO) disciplines do not directly confront the problematic nature of digital trade or AI, even though the WTO officially recognizes that AI, together with blockchain and the Internet of Things, is one of the new disruptive technologies that could have a major impact on trade costs and international trade.Footnote 93 Mira Burri has, however, described how WTO general nondiscrimination principles – Most Favorable Nation Treatment and National Treatment – could potentially have an impact on the members’ rules and practices regarding digital trade, as well as more specific WTO agreements, especially the General Agreement on Trade in Services (GATS).Footnote 94 She notes that WTO members have made far-reaching commitments under the GATS. The EU in particular has committed to data processing services, database services and other computing services.Footnote 95 These commitments might prohibit new measures with regard to search engines that limit market access or discriminate against foreign companies, as they should be considered data processing services. Localization requirements with regard to computer and related services would also be prima facie GATS-inconsistent, but could well be justified under the agreement’s general exceptions.Footnote 96
Despite a few updates, such as the Information Technology Agreement, WTO members have failed, as in other fields, to renovate and adapt proper WTO disciplines to strategic issues, such as digital trade and AI. The current plurilateral negotiations on e-commerce, which involve seventy-nine members including China, Japan, the USA and the EU and its member states, might represent a new opportunity to address these issues.Footnote 97 However, given the current state of the WTO, such evolution remains, at present, hazardous.Footnote 98 So far, the most relevant provisions on digital trade are those negotiated within the bilateral or plurilateral trade deals, beginning with the TPP.Footnote 99
Recent developments in EU digital trade diplomacy can be seen as a reaction to the United States’ willingness to develop an offensive normative strategy whose basic aim is to serve its big tech companies’ economic interests and to limit cross-border restrictions based on data privacy protection as much as possible.
B The US Approach to Digital Trade Diplomacy
The United States’ free trade agreement (FTA) provisions on digital trade are the result of the Digital Agenda that was endorsed in the early 2000s. Several US trade agreements containing provisions on e-commerce have been concluded by different American administrations over the last two decades.Footnote 100 In 2015, the United States Trade Representative described the TPP as “the most ambitious and visionary internet agreement ever attempted”.Footnote 101 The TPP provisions relate to digital tradeFootnote 102 in various respects, including, inter alia, nondiscriminatory treatment of digital products,Footnote 103 a specific ban of custom duties on electronic transmissionFootnote 104 and free supply of cross-border digital services.Footnote 105 More specifically, despite recognizing the rights of the parties to develop their own regulatory requirements concerning the transfer of information by electronic means, the agreement prohibits the limitation of cross-border transfer of information by electronic means, including personal information.Footnote 106 Additionally, under the TPP, “no Party shall require a covered person to use or locate computing facilities in that Party’s territory as a condition for conducting business in that territory”.Footnote 107 US tech companies were deeply satisfied with the content of the agreement.Footnote 108
However, the TPP drafters did not ignore the problematic nature of personal information protections. Indeed, the text of this agreement recognized the economic and social benefits of protecting the personal information of users of electronic commerce.Footnote 109 It even indicated that each party shall adopt or maintain a legal framework that provides for the protection of the personal information of the users of electronic commerce, therefore admitting the possibility of following different legal approaches. However, each party should adopt instruments to promote compatibility between the different legal frameworks,Footnote 110 and the agreement’s wording is relatively strong on the nondiscriminatory practices in terms of user protections.
The GDPR was still under discussion when the TPP was concluded. However, there is room for debate concerning the possible compatibility of the European legislation and this US trade treaty. As with the WTO compatibility test, the main issue concerns the possible discriminatory nature of the GDPR, which in practice is arguable. This doubt certainly constituted an incentive for the EU to elaborate upon and promote its own template on digital trade, in order to ensure that its new legislation wouldn’t be legally challenged by its trade partners, including the US administration.
Just like the TPP, the USMCA contains several provisions that address digital trade, including a specific chapter on this issue.Footnote 111 It also prohibits custom duties in connection with digital productsFootnote 112 and protects source code.Footnote 113 The prohibition of any cross-border transfer or information restriction is subject to strong wording, as the agreement explicitly provides that “[n]o Party shall prohibit or restrict the cross border transfer of information, including personal information, by electronic means if this activity is for the conduct of the business of a covered person”.Footnote 114 Yet, the USMCA admits the economic and social benefits of protecting the personal information of users of digital trade and the relevance of an internal legal framework for the protection of this information.Footnote 115 However, the conventional compatibility of internal regulations that would limit data collection relies on a necessity and proportionality test and a nondiscrimination requirement. In any case, the burden of proving compatibility will undoubtedly fall on the party that limited data transfer in the first place, even though it did so on the grounds of legitimate policy objectives. Under these circumstances, the legality of GDPR-style legislation would probably be even harder to argue than under the former TPP.
C The European Union’s Response to the American Trade Regulatory Challenge
Before studying the precise content of existing EU agreements and proposals on digital trade, one should bear in mind that European trade policy is currently subject to strong internal tensions. Trade topics have become increasingly politicized in recent years, especially in the context of the Comprehensive Economic and Trade Agreement (CETA) and Transatlantic Trade and Investment Partnership (TTIP) negotiations. It is not only member states, through the Council, and the European Parliament – which has obtained, after the Lisbon Treaty, the power to conclude trade agreements together with the Council – that have placed pressure on the Commission. Pressure has also come from European civil society, with movements organized at the state and the EU level.Footnote 116 As a result, the idea that trade deals should no longer be a topic for specialists and be subject to close political scrutiny is gaining ground in Europe. As a response, the capacity of trade agreements to better regulate international trade is now part of the current Commission’s narrative to advocate for the necessity of its new FTA generation,Footnote 117 in line with European primary law provisions that connect trade with nontrade policy objectives.Footnote 118 The most recent generation of EU FTAs incorporate a right to regulate, which is reflected in several provisions, in particular in the context of the sustainable developmentFootnote 119 and investment chapters.Footnote 120 More recently, the EU also showed a willingness to include a right to regulate in the digital chapter’s provisions.Footnote 121 Paradoxically, the recall of the state power to regulate is the prerequisite of stronger trade liberalizationFootnote 122 and, more broadly, a way in which to legitimize the extension of trade rules.
Older trade agreements, meaning those concluded before 2009, when the Lisbon Treaty entered into force, remained practically silent on the issue of digital trade or electronic commerce. The EU-Chile (2002) trade agreement is probably the first FTA that contains references to e-commerce, probably under the influence of the US-Chile FTA concluded during the same period. However, the commitments were limited as they refer to vague cooperation in this domain.Footnote 123 Moreover, the service liberalization was strictly contained within the limits of the positive list-based approach of the former generation of European FTAs.Footnote 124 The EU-Korea FTA of 2011 contains more precise provisions on data flows, yet it is limited to specific sectors.Footnote 125 For instance, Article 7.43 of this agreement, titled “data processing”, is part of a broader subsection of the agreement addressing financial services. The provision encourages free movement of data. Yet, it also contains a safeguard justified by the protection of privacy. Moreover, the parties “agree that the development of electronic commerce must be fully compatible with the international standards of data protection, in order to ensure the confidence of users of electronic commerce”. Finally, under this agreement, the cross-border flow of supplies can be limited by the necessity to secure compliance with (internal) laws or regulations, among which is the protection of the privacy of individuals.Footnote 126 Although limited to specific sectors, those provisions demonstrate that the EU was aware of the potential effect of data protection on trade long before the adoption of the GDPR.Footnote 127
This sectoral approach has been followed by the EU and its partners in more recent trade agreements, such as the CETA between the EU and Canada, which was concluded in 2014.Footnote 128 Chapter 16 of the CETA agreement deals expressly with e-commerce. It prohibits the imposition of customs duties, fees or charges on deliveries transmitted by electronic means.Footnote 129 It also states that “[e]ach Party should adopt or maintain laws, regulations or administrative measures for the protection of personal information of users engaged in electronic commerce and, when doing so, shall take into due consideration international standards of data protection of relevant international organizations of which both Parties are a member”.Footnote 130 However, the CETA also contains another innovative and broader exception clause based on data protection. Article 28.3 addresses the general exception to the agreement, and provides that several chapters of the agreement (on services and investment, for instance) can be subject to limitation based on the necessity to “secure compliance with laws or regulations which are not inconsistent with the provisions of this Agreement including those relating to … the protection of the privacy of individuals in relation to the processing and dissemination of personal data”. Finally, the CETA agreement, unlike the US model, does not contain a general free data flow provision and only promotes specific forms of data transfer, consistent with European economic interests, such as financial transfers for data processing in the course of business.Footnote 131
The current European strategy regarding trade and data protection appears more clearly in the negotiations after the adoption of the GDPR. In 2018, the European Commission made public proposals on horizontal provisions for cross-border data flows, and for personal data protection in EU trade and investment agreements.Footnote 132 This template is an attempt to reconcile diverging regulatory goals, in particular human rights considerations and economic considerations.Footnote 133 This conciliation is also symbolized by the internal conflict, inside the Commission, between the Directorate General for Trade (DG Trade), traditionally in charge of trade negotiations, and the Directorate General for Justice and Consumers (DG JUST). DG Trade has shown greater sensitivity toward cross-border data flows, whereas DG JUST conceived trade law as an instrument to expand Europe’s privacy protections.Footnote 134 As a result, this template supports cross-border data flows while also immediately recognizing that the protection of data and privacy is a fundamental right. Therefore, the protection of data privacy is exempted from any scrutiny.Footnote 135 This privacy safeguard uses the wording from a clause to the national security exceptions and contrasts with the necessity and proportionality tests put in place under the TPP and USMCA. Not surprisingly, this privacy carve-out was immediately criticized by tech business lobbyists in Brussels.Footnote 136
However, the EU proposals formulated in late 2018, under the framework of the negotiation of two new FTAs with Australia and New Zealand (initiated in 2017), largely confirmed the template’s approach. First, the EU’s proposed texts refer to the right of the parties to regulate within their territories to achieve legitimate objectives, such as privacy and data protections.Footnote 137 These proposals also further cross-border data flows in order to facilitate trade in the digital economy and expressly prohibit a set of restrictions, among which are requirements relating to data localization for storage and processing, or the prohibition of storage or processing in the other party’s territory. Moreover, the proposals protect the source code, providing that, in principle, the parties cannot require the transfer of, or access to, the source code of software owned by a natural or juridical person of the other party.Footnote 138 A review clause on the implementation of the latter provision, in order to tackle possible new prohibitions of cross-border data flows, is included. Additionally, the European proposals allow the parties to adopt and maintain safeguards they deem appropriate to ensure personal data and privacy provisions. The definition of personal data is similar to the GDPR’s conception.Footnote 139 This approach is also in line with the EU’s proposal, formulated within the context of the plurilateral negotiations regarding e-commerce, which took place at the WTO in April 2019.Footnote 140
The ability of the EU to persuade its trading partners to endorse its vision on digital trade remains uncertain. In this context, the content of the Digital Chapter of the recently concluded FTA between the EU and Japan is not very different from the CETA,Footnote 141 demonstrating the absence of real common ground and Japanese support on this issue. Whereas the JEFTA is an ambitious text in a wide range of sensitive trade matters (such as geographical indications, service liberalization and the link between trade and the environment), it only refers to a vague review clause regarding digital trade and free data flows.Footnote 142 However, as mentioned earlier, the question of cross-border data flows between Japan and the EU has been dealt with through the formal process that led Japan to reform its legal framework on data protection, which in turn led to the Commission’s 2019 adequacy decision.Footnote 143 Unilateral instruments remain, for the EU, the de facto most efficient tools when it comes to the promotion of its conception of data protection.Footnote 144
V Conclusion
The entry into force of the GDPR coincides with a new era of international trade tensions, which might be interpreted as a new symbol of the European “New, New Sovereigntism” envisioned by Mark Pollack.Footnote 145 The European way of addressing the issue of data processing and AI is, in reality, illustrative of the limits of the current European integration process. European industrial policies in this field have been fragmented among the member states, which have not achieved the promise of a single digital market and, even more problematically, have not faced strong international competition. So far, the EU’s response to this challenge has been mostly legal and defensive in nature. Yet, such a strategy is not in itself sufficient to address the challenges raised by AI. Smart protectionism might be a temporary way for Europe to catch up with the United States and China, but any legal shield will in itself prove useless without a real industrial policy that necessitates not only an efficient regulatory environment but also public investment and, more broadly, public support. The post-COVID-19 European reaction and the capacity of the EU and its member states to coordinate their capacities, modeled on what has been done in other sectors such as the aeronautic industry, will be crucial. After all, the basis of the European project is solidarity and the development of mutual capacity in strategic economic areas, such as coal and steel in the 1950s, and a context of crisis and the risk of a decline of the “old continent” may serve as a strong catalyst for an efficient European AI policy.
On a more global and general level, the analysis of the GDPR and the European trade position on data flows and AI illustrates that this new and disruptive sector has not escaped the existing tensions between free trade and protectionism. Unsurprisingly, the new digital trade diplomacy is subject to an old rule: negotiators’ positions are largely influenced by economic realities and the necessity to promote a competitive industry or to protect an emerging sector, respectively. Fundamental rights protection considerations that led to a form of “data protectionism” in the EU are certainly also influenced by its economic agenda. On the other hand, the US promotion of free flows of data essentially responds to the interest of its hegemonic companies and their leadership on the Internet and AI. The admission of the free data flows principle from the EU might correspond to the growing presence of data centers in the EU’s territory, which followed the entry into force of the GDPR, given the necessity to comply with this regulation.Footnote 146 It can also be interpreted as a hand up to its trade partner, in exchange for the admission of a large data privacy carve-out that would legally secure the GDPR under international trade law. However, unless extremely hypothetical political changes occur and a willingness to forge a transatlantic resolution or a multilateral agreement on these questions materializes, the fragmentation of the digital rules on data transfer will likely remain a long-term reality.
Today’s technology giants have won market dominance through the collection, analysis, and synthesis of data. With the increasing dependence on digital technology, and increasing data dependency of said technology, data can be seen as a precondition to economic participation. Exploiting the steep economies of scale and network externalities of data, firms such as Google, Facebook, and Amazon are in positions of near monopoly. When such service providers disallow users from transferring their data to competing services, they can lock in users and markets, limiting the entry of market competition. Providing users with rights to both retrieve their data and transmit it to other firms potentially serves as a counterbalance, easing the acquisition of users for new market entrants. As such, data portability legislation has been claimed to have far-reaching implications for the private sector, reducing or hindering tools of forced tenancy. With users no longer married to a single firm, inroads for new technology are paved, with the average user more likely to have the ability and resource to change provider and adopt a solution that better suits their individual needs.
This chapter explores the concept of data portability in a world driven by artificial intelligence (AI). Section I maps out the journey that data takes in a data economy and investigates the valuation and cost of data. It posits that, because of data analytics and machine learning models, “generated” data, as data that has been derived or inferred from “raw” data, is of higher value in the data market, and carries a higher cost of production. Section II discusses what is required for the free flow of data in competitive datacentric markets: regulations on data tradability and portability. Our analysis leads to doubt that the newly introduced, hotly debated rules regarding portability of data under European Union (EU) law will adequately provide these prerequisites. The chapter concludes by suggesting an alternative model for data portability that distinguishes on a value basis rather than between personal and nonpersonal data.
I The Journey and Value of Data
This first section reviews the journey of data from collection to classification: the path from its moment of provision by a data subject to its subsequent transformation into inferred data. We present a categorization model that distinguishes data according to its origin, primarily distinguishing between raw and generated data. Utilizing these categories, we illustrate how data generated by machine learning models is being created at an exponential rate in today’s data-driven economy. Lastly, a data valuation model is introduced, holding that the value of generated data is higher than raw data, and that the value of generated data scales exponentially in aggregation. We claim that the added value of generated data is created by firms that carry the costs of providing big data analytics, including machine learning.
A Origin of Data
Data can be classified according to a variety of parameters. A classification model can rely on the sensitivity of the subject, purpose of use, context of procession, degree of identifiability, or method of collection of data. We build on a categorization model of data that was introduced by a roundtable of Organisation for Economic Co-operation and Development (OECD) privacy experts in 2014,Footnote 1 and expanded by Malgieri.Footnote 2 The taxonomy categorizes data according to its origin – that is, the manner in which it originated – and distinguishes between raw data (provided and observed data) and generated data (derived and inferred data).
Raw data (“user-generated data”) encompasses provided and observed data. Provided data is data originating from the direct actions of individuals (e.g., registration form filing, product purchases with credit card, social media post, etc.). Observed data is data recorded by the data controller (e.g., data from online cookies, geolocation data, or data collected by sensors).
Generated data (“data controller-generated data”) consists of derived and inferred data. Derived data is data generated from other data, created in a “mechanical” manner using simple, non-probabilistic reasoning and basic mathematics for pattern recognition and classification creation (e.g., customer profitability as a ratio of visits and purchases, common attributes among profitable customers). Inferred data is data generated from other data either by using probabilistic statistical models for testing causal explanation (“causal inferences”) or by using machine learning models for predicting output values for new observations given their input values (“predictive inferences”).Footnote 3
The relationship between information and the data subject can be classified as either strong (provided data), intermediate (observed and derived data), or weak (inferred data). The stronger the relationship, the more individuals are involved in the creation of the data. Illustratively, a Facebook user has a strong relationship with their registration data and their posts. An example of a weaker relationship would exist when Facebook, based on its algorithmic models, assigns a liberal or conservative political score to this user. The user’s age, geographic location, and posts are all data provided by the user, and eventually included as independent variables in the model. But it is Facebook’s model that will ultimately predict the likelihood the user belongs to either group.
The evolving relationship from provided to inferred data, or from a weak to strong relationship between the data subject and the data, is illustrated in Figure 11.1. Although the delimitation of data types is crucial, a number of gray areas exist. Take the example of a data subject that does not upload data themself, but actively selects which sets of data and their conditions the data controller may access. It is unclear whether these datasets are received or observed by the controller.Footnote 4 Note that inferred data is created not only by the analysis of a specific user’s data, but also by the analysis – via statistical learning and automatic techniques to elicit patterns – of all data available to the data generator, including personal data provided by other users.Footnote 5
B Artificial Intelligence and Data
With the rise of AI, generated data is expected to proliferate at an exponential rate. As more and more institutions take advantage of increasingly broad datasets, computing power, and mathematical processes,Footnote 6 the amount of generated data will expand and the costs of prediction decrease.Footnote 7 As pointed out by a recent report ordered by the House of Commons of the United Kingdom, protecting data helps to secure the past, but protecting inferences is what will be needed to protect the future.Footnote 8 Inferential data generated by machine learning techniques has already been used (with varying success)Footnote 9 to predict sexual orientation based upon facial recognition; emotions of individuals based on voice, text, images, and video; a neighborhood’s political leanings by its cars; and physical and mental predictions, to name but a few.Footnote 10 With the advancement of the technology and the availability of large training sets, the accuracy of inferred predictions will increase as well.
The predictive potential of machine learning is not confined to academic use cases, as commercial applications abound. Recent patent applications in the USA include methods for predicting personality types from social media messages,Footnote 11 predicting user behavior based on location data,Footnote 12 predicting user interests based on image or video metadata,Footnote 13 or inferring the user’s sleep schedule based on smartphone and communication data.Footnote 14 In all these instances, raw user data is collected on mobile devices (e.g., smartphones, tablets, etc.) to build and train the predictive model, and then used to predict individual user characteristics and behaviors (as generated data). This generated data is of value to marketing and advertising firms or organizations more generally to identify target users for their products and services.
C Valuation of Data
In general, the valuation of data is difficult, as it varies widely by type, scale, and industry sector. We make two assumptions that underly this chapter, and that support our position that generated data is of higher value than raw data. We claim that the higher value of generated data derives from the investment of firms in development, and subsequent use of statistical and machine learning models.
Our first assumption is that at the single datapoint level, raw data is on average of lower value than generated data. Our explanation for this assumption is as follows: raw data (such as the age of a data subject) is assumed to be, on average, of lower value to companies than generated data (such as future health predictions). In fact, in the marketplace, the price for general information, such as age, gender, and location, can be purchased for as little as $0.0005 or $0.50 per 1,000 people.Footnote 15 We assume that the price for creation of and access to generated data is higher.Footnote 16 The value of the datapoint integrates the value-added created by the respective algorithm. This is a generalizable claim despite specific and highly contextual differences. To provide a counterexample, data relating to diseases directly provided by a patient might be of higher value to an insurance company than a prediction based on that data.Footnote 17
Our second assumption is that, on a large scale, the value of raw data increases linearly, whereas the value of generated data increases exponentially. We make this assumption for the following reasons: for statistical or machine learning approaches, received and observed data will need to be purchased on a large scale in order to build models that will create inferred data. Since the accuracy of predictions is largely a function of the size of training datasets, we can assume that the value of received and observed data is close to zero for small-scale datasets. On the other hand, past acquisitions of datacentric companies reveal a significantly higher value per user, varying between $15 and $40. For instance, Facebook acquired WhatsApp and Instagram for $30 per user.Footnote 18 These per-user valuations reflect both the quality and scope of the information collected, as well as the expectation of continued platform engagement, and subsequent additional data creation by each acquired user.Footnote 19 In short, these acquisitions aim at exploiting trends and patterns in large groups with high confidence in the quality of the data. The process of value creation directly depends on investment in machine learning models needed to convert data into predictions.Footnote 20 These include direct operating costs, such as the employment costs of engineers, the licensing costs for software programs, the costs for obtaining more computer power and storage, and the costs of integrating systems with the implementation platform, as well as indirect costs such as training and change management costs or cybersecurity monitoring costs.Footnote 21 Therefore, the valuation of datacentric companies reflects the value of aggregated generated data or the potential for firms to create aggregated generated data.Footnote 22 We represent the respective value of raw data and generated data in Figure 11.2: with more data, the value of raw data increases linearly (a), whereas the value of generated data increases exponentially (b).
II Portability of Data
This second section analyzes the newly introduced regulation of data portability in Europe. With the goal of moving toward a single market for data, the EU has sought to remove obstacles to the free movement of data via two regulations regarding personal and non-personal data. We evaluate the newly introduced right to data portability under the General Data Protection Regulation (GDPR)Footnote 23 and the porting of data regime under the Non-Personal Data Regulation (NPDR).Footnote 24 Our analysis of both data portability concepts suggests that the current separation between personal and non-personal data does not provide for a comprehensive and coherent data portability regime.
A Free Flow of Data
EU law has a long tradition of shaping regulation to create a single market for goods, services, people, and capital. In recent years, the European Commission has emphasized the need for a data ecosystem built on trust, data availability, and infrastructure.Footnote 25 Ensuring the free flow of data is part of this effort to establish a “digital single market”.Footnote 26 Data is increasingly seen as a tradable commodity.Footnote 27 While the framework for trading data can be found in the traditional civil law rules for purchase contracts, the contract performance – that is, the actual transfer of the data – largely depends on the existence of data portability as a legal institution.Footnote 28 We are interested in how a regulatory framework for the market may level the playing field, challenging large incumbents with a vested interest in not transferring potentially valuable data to competitors.Footnote 29
The more data is concentrated in the hands of a provider, the more likely it will be considered to hold a dominant position under EU competition law.Footnote 30 Although the dominant competition law test is based on market share, not data concentration, said concentration is likely to lead to large market shares in data-driven markets.Footnote 31 The European Data Protection Supervisor has discussed how portability of data can foster a functioning market by preventing the abuse of dominance and the lock-in of consumers.Footnote 32 EU competition law, however, can generally be characterized as an ex post regulation: in fact, the European Commission only intervenes once a dominant position has been abused in already existing markets.Footnote 33
As digital markets are especially prone to winner-takes-all (or -most) outcomes,Footnote 34 additional ex ante regulations are key. The EU has set up a number of these ex ante mechanisms, in particular in sector-specific regulation. A prominent example of this is the telecommunications sector: the Universal Service Directive established a right to number portability, considered a predecessor to the right to data portability under EU law.Footnote 35 The portability of telephone numbers and of data facilitates effective competition and can be considered a form of ex ante regulation as it creates the prerequisites for establishing a functioning telecommunication market.
The free movement of data is further addressed in Art. 16(2)1 of the Treaty on the Functioning of the European Union (TFEU), which gives the EU legislator the power to establish rules regarding the protection and free movement of personal data. The GDPR confirms the free movement of data as a subject-matter of the regulation and postulates that the free movement of personal data within the EU shall not be restricted or prohibited for the protection of personal data.Footnote 36 These affirmations refer once more to the foundation of the EU: free movement of goods, services, people, capital, and now data in a single market. Since May 2019, the regime is complemented by the NPDR.Footnote 37 Targeting non-personal data, the NPDR is entirely based on the ideal of the free flow of data. According to the NPDR, the two regulations provide a coherent set of rules that cater for free movement of different types of data.Footnote 38
B Regimes for Data Portability
The portability of data is explicitly covered by both the GDPR and the NPDR. The former only applies to “personal data”, the latter to “non-personal data”.Footnote 39 EU law therefore clearly delineates personal from non-personal data.
Personal data is defined as “any information relating to an identified or identifiable natural person (‘data subject’)”.Footnote 40 The notion of personal data is broad, as it only requires that a natural person can be identified directly or indirectly. It is sufficient, for instance, that the link to the natural person can be established using other reasonably accessible information – such as a combination of specific browser settings used to track behavior for personalized advertising.Footnote 41
Non-personal data, by contrast, is any information that does not relate to an identified or identifiable natural person. Firstly, this encompasses data that originally does not relate to an identified or identifiable natural person, such as weather information or data relating to the operation of machines. Secondly, properly anonymized data cannot be attributed to a specific person and is therefore non-personal.Footnote 42 However, if non-personal data can be linked to an individual, the data must be considered personal.Footnote 43
1 Portability of Personal Data
The newly introduced right to data portability in Art. 20 GDPR gives the data subject the “right to receive the personal data concerning him or her, which he or she has provided to a controller, in a structured, commonly used and machine-readable format and have the right to transmit those data to another controller without hindrance from the controller to which the personal data have been provided”. Data subjects shall have the right to receive the personal data concerning them and transmit that data to other controllers.
The provision mirrors the GDPR’s dual purpose: both the protection of personal data and the free flow of personal data. The right to the protection of personal data is intertwined with a market-centered economic approach to personal data.Footnote 44
Not all personal data is subject to the right of portability. Only personal data for which the processing is based on consent or a contractual relationship is covered by the norm.Footnote 45 This limitation largely corresponds to the requirement that the personal data in question was provided by the data subject.Footnote 46 Accordingly, raw personal data is covered by portability because provided data is, by definition, directly provided by the data subject and observed data is (by most accounts) considered as such.Footnote 47 Generated data, however, whether derived or inferred, is not considered as being provided by the data subject.Footnote 48 Therefore, a large share of personal data is not subject to portability as it is not provided by the data subject.Footnote 49
The GDPR provides a relatively strong right to data portability for the data subject. Data portability is seen from the data subject’s perspective, with a focus on data protection. Creating a comprehensive regime for the portability of all kinds of personal data was not the priority of the EU legislator, as shown by the exclusion of generated personal data. Although the norm is often discussed as being situated in the area of competition law – with its aim of facilitating the free flow of data – data portability under the GDPR is still being considered closer to genuine data protection law than to regulatory competition law.Footnote 50
2 Portability of Non-personal Data
With the NPDR, the EU encourages the porting of non-personal data.Footnote 51 The Internet of Things or industrial settings are major sources of non-personal data, as exemplified by aggregate and anonymized datasets used for big data analytics, data on precision farming, or data on maintenance needs for industrial machines.Footnote 52 The Regulation addresses two obstacles to non-personal data mobility: data localization requirements imposed by the public sector and private vendor lock-in practices.Footnote 53 Such a lock-in effect might exist if cloud services like data storage or cloud-based data applications do not ensure the portability of the respective data.
While the GDPR provides for an enforceable right of the data subject, the NPDR approaches portability differently. The regulation encourages self-regulatory codes of conducts; that is, legally nonbinding instruments. The norm expressively refers to best practices that should facilitate the porting of data through “structured, commonly used and machine-readable formats including open standard formats where required or requested by the service provider receiving the data”.Footnote 54 Meanwhile, codes of conduct on the porting of data and switching between cloud service providers have been developed by the cloud switching and porting data working group (SWIPO) for Infrastructure-as-a-Service (IaaS) and Software-as-a-Service (SaaS) cloud services.Footnote 55 These codes require, inter alia, the use of application programming interfaces (APIs),Footnote 56 open standards, and open protocols by cloud service providers.
C Analysis: The Concept of Data Portability
Our analysis depicts the limitations of existing EU law in providing for the free movement of data via a comprehensive and effective portability regime. In particular, we discuss how the distinction between personal versus non-personal data and raw versus generated data may impact the concept of data portability.
1 Distinction between Personal and Non-personal Data
The EU framework separates data into two types: personal and non-personal. This separation subjects data to different regulatory regimes – with a number of consequences in terms of portability. The distinction between personal and non-personal data is meant to preserve a high level of protection for data that can be related to an individual. The GDPR accordingly sets forth a right to access available for all type of personal data, whether raw or generated.Footnote 57 The NPDR targets data that is not related to an identifiable natural person. The interests of datacentric businesses stand in the center of the regulation. The free flow of data is therefore targeted from the data subject’s perspective as well as from the perspective of market regulation.
In theory, the distinction between personal and non-personal data appears straightforward. In practice, this is often not the case. For instance, large datasets where personal and non-personal data are mixed up (so-called mixed datasets) make it hard to identify the applicable legal regime. The NPDR recognizes this situation and addresses it by splitting up the application of both legal regimes to the respective type of data.Footnote 58 In cases where both types are inextricably linked, the application of the GDPR takes precedence (even if personal data represents a small part of the set only).Footnote 59 Addressing the complexity of GDPR compliance for mixed datasets can have a large impact on technology firms’ associated costs. Uncertainty still prevails in the field on how to avoid falling under the GDPR.
This lack of legal certainty provides an incentive to anonymize data. The underlying belief is that personal data can be turned into non-personal data by anonymization, as anonymization destroys the link to an identifiable person. Consequently, the NPDR takes into consideration future technological developments making it possible to turn anonymized data into personal data, with the consequence of then having to treat such data as personal data and to apply the GDPR to it.Footnote 60 Recent studies, however, have challenged the common understanding of anonymization. The European Commission itself has addressed these concerns, but remains committed to the belief that anonymization can be achieved in practice.Footnote 61 In an influential study, Rocher, Hendrickx, and de Montjoye showed that 99.98 percent of Americans would be correctly reidentified in any dataset using fifteen demographic attributes.Footnote 62 A range of additional studies have supported this point, with reidentification of supposedly anonymous datasets in healthcare, ride-sharing, subway, mobile phone, and credit card datasets.Footnote 63 All this raises doubts about whether the distinction between personal and non-personal data can be upheld in the future.
2 Distinction between Raw and Generated Data
The right to data portability under the GDPR only applies to personal data provided by the data subject. From the viewpoint of providing access to the market of social media services, the portability of raw data alone is considered sufficient to prevent customer lock-in. Although the controller uses raw data (provided and observed) to generate derived and inferred data, generated data is not considered as “provided by the data subject” in the sense of Art. 20(1) GDPR. As such, generated data does not fall under the right to data portability. However, if it qualifies as personal data, generated data is still subject to the right of access or the right to not be subject to automated individual decision-making.Footnote 64 Consequently, the GDPR offers the data subject access to its personal data and protection regardless of whether the data is raw or generated.Footnote 65
A reason why the right to data portability under the GDPR does not cover data created by the controller (i.e., generated data) might be that portability would here grant a strong advantage to competitors. Porting generated data would grant companies access to especially valuable data (Assumption 1), whose aggregation scales exponentially (Assumption 2).Footnote 66 The GDPR envisages a model where the data subject provides raw data to social media providers and leaves the additional value of the data to these providers as a compensation of their costs. But this is only justified in instances like Facebook, where the user “pays” with their data in exchange for the free use of the service. The service provider bears the cost of providing the social network and may recoup their investments by gaining profit from the added value the raw data gains through inferential information, illustratively via advertising. If the data subject, however, pays for a service, be it social networking or an analysis of their personal data, the situation is entirely different: the service provider’s costs are being compensated by monetary payment. The added value of the derived or inferred data should then remain with the data subject and fall under the scope of the right to data portability.Footnote 67
This situation is similar to the one envisaged by the NPDR: one between a customer and a service provider. When the customer provides raw data to the data service provider who conducts statistical analysis or prediction through machine learning on this data on behalf of the customer, the customer bears the cost of transformation of the data. As the costs are assigned to them, they should be able to obtain the value of the resultant generated data, to transfer it and switch providers. This right would in general already be subject to a civil law contract by which the relationship between service provider and customer is governed. The role and task of regulation would then only be to enforce portability in cases where service providers have market power to the extent that such portability and its conditions (portable file format, interfaces, etc.) is not subject to the service agreement. For this reason, the data porting rules under the NPDR may be insufficient as they are nonbinding and limited to self-regulatory measures. The European Commission or the respective member state authorities would need to take competition law measures based on abuse of dominant position, which have the limitation of being ex post in nature.
An already binding obligation of non-personal data portability can be seen in Art. 16(4) Digital Content Directive,Footnote 68 albeit limited to the area of digital content and digital services: the consumer is granted the right to request “any content other than personal data, which was provided or created by the consumer when using the digital content or digital service supplied by the trader”. This stipulation affirms our position: the value of digital content – that is, the data – created by the customer is assigned to the customer, leading to a right to retrieve that data in a commonly used and machine-readable format, as the second subparagraph states.
D Data Portability beyond the European Union
The EU has taken the lead in shaping the way the world thinks about data protection, privacy, and other areas of digital market regulation.Footnote 69 Its data protection standards in particular have been diffused globally.Footnote 70 Firstly, the ideas and concepts of the GDPR – in our case of data portability – have influenced a number of jurisdictions to enact data portability norms themselves. Secondly, international firms are bound directly by the GDPR’s and the NPDR’s extraterritorial scope, even without being established in the EU. Thirdly, because of the “Brussels Effect” foreign corporations often prefer to respect EU law even without a legal obligation to do so. Fourthly, international soft law has been and can be deployed to integrate data privacy principles from the EU, playing thereby a key role in the international governance of portability regimes. Fifthly, data privacy obligations have been stipulated in international treaties, requiring the implementation of data portability norms within the national law of ratifying states. In this regard, international economic law can help to diffuse data portability rules across the world.
1 Adoption by Third Countries
Numerous data protection laws around the world have emulated the GDPR, including its right to data portability.Footnote 71 A prominent example is the California Consumer Privacy Act, signed a month after the GDPR came into effect. The legislation incorporates portability in the context of the right to access as a modality of how electronic access should be provided; that is, in a portable format.Footnote 72 In comparison to the GDPR, the stipulation has an arguably broader scope, as all personal data is portable, and not only the personal data provided by the data subject. On the other hand, the norm is weaker as the businesses collecting personal information can provide access nonelectronically by simple mail, even if the data is stored digitally. Businesses are thus offered a way to circumvent portability, unless they do not mind the additional costs of mail delivery (which might be less than investing in interoperability).
Other examples of adoption include Benin, which enacted a GDPR-like legislation with its Code du numérique and included a right to data portability.Footnote 73 Brazil has adopted a new General Data Protection Law that introduces a right to the portability of data.Footnote 74 A possible codification of data portability is further vividly discussed by a number of countries.Footnote 75 Japan, for instance, has initiated a study to assess the merits and demerits of data portability, taking into consideration the costs for firms to establish portability.Footnote 76
2 Extraterritorial Application
EU law imposes itself on foreign entities by extending its scope of application beyond EU territory. Inspired by the famous Google Spain judgment of the European Court of Justice,Footnote 77 Art. 3(2) GDPR introduces a remarkably broad territorial scope: GDPR applies to controllers or processors not established in the EU if the processing activities are related to the offering of goods or services to data subjects in the EU or to the monitoring of their behavior within the EU.Footnote 78 Data portability can therefore be requested by an EU citizen or resident from a foreign – for instance, US – firm, if the activities of the firm fall under the GDPR.
Similarly, the NPDR applies in cases where the processing of electronic non-personal data in the EU is provided as a service to users residing or having an establishment in the EU, regardless of whether the service provider is established in the EU (Art. 2(1)(a) NPDR). However, since data portability under the NPDR is of a nonbinding nature, abiding is voluntary. As we suggest, a comprehensive data portability regime for personal and nonpersonal data would therefore be desirable at an international level.
3 Unilateral Power
The EU has been able to externalize its laws beyond its borders via the so-called unilateral power of the EU. While foreign firms are only bound by their national laws, they increasingly have been following EU data protection law.Footnote 79 This can, on the one hand, be explained by the advantages of international firms following a single rule, and preferring to harmonize their process and services for cost mitigation purposes.Footnote 80 In other words, it might be cheaper for a company to develop a single framework (that follows European data protection law), rather than two or more different ones (one following a stricter European regime, one a more lenient one). In the past, large technology companies like Facebook and Google have often made their data portability tools available to all their customers, independently of their location.Footnote 81 On the other hand, Apple took a staged approach and introduced its portability tool for users in Europe only in 2019 and made it available to US and Canadian users in 2020.Footnote 82 Apple, Facebook, Google, Microsoft, and Twitter are further contributing to the creation of an open-source framework connecting providers by translating provider specific APIs into common “data models” that can be transferred.Footnote 83
4 International Soft Law
In the past, a number of guiding documents from the EU, such as the Article 29 Working Party Guidelines on the right to data portability in particular, already had a major impact on the interpretation of data portability concepts.Footnote 84 The guidelines, set by this former advisory board that has been replaced by the European Data Protection Board (EDPB) representing the data protection authorities of the EU member states, have been subject to extensive academic discussion and scrutiny by corporations.Footnote 85
International soft law has long served as inspiration for national privacy codification, beginning with the OECD Privacy Guidelines of 1980, which were revised in 2013.Footnote 86 The Guidelines explicitly refer to personal data as an “increasingly valuable asset”. Their aim has been to foster the free flow of information by preventing unjustified obstacles to economic development, namely by setting a minimum standard for national legal frameworks on privacy. The original 1980 OECD Privacy Guidelines were influential at first, encouraging the adoption of data protection laws in eight countries outside Europe (including Canada and Japan), but their impact diminished when the EU adopted its Data Protection Directive in 1995,Footnote 87 which went beyond the OECD Guidelines.Footnote 88 The OECD Guidelines also influenced the Asia-Pacific Economic Cooperation (APEC) Privacy Framework.Footnote 89
As the OECD is reviewing its Guidelines, it could include a data portability norm in a future revision. However, as the OECD Guidelines only cover personal data, a right to data portability in the OECD Guidelines (alone) would not match its (optimal) scope.Footnote 90 Data portability should, in our view, rather be added to other international soft law instruments and encompass both personal and non-personal data.
5 International Hard Law
European portability concepts have been reflected in international treaties. This may be exemplified by the inclusion of a clause regarding the portability of telephone numbers in the Comprehensive and Progressive Agreement for Trans-Pacific Partnership (CPTPP), a trade agreement between Australia, Brunei, Canada, Chile, Japan, Malaysia, Mexico, New Zealand, Peru, Singapore, and Vietnam.Footnote 91 As mentioned in subsection B, the right to telephone number portability in the former Art. 30 Universal Services DirectiveFootnote 92 can be seen as a predecessor to data portability. Furthermore, the Eastern Caribbean Telecommunications Authority is planning to codify number portability in its new Electronic Communications Bill.Footnote 93
Against this backdrop, the question arises whether international trade agreements should include data portability provisions going forward – either in competition chapters or in dedicated electronic commerce or digital trade chapters. Regulation on an international level, however, would require a supranational understanding of the modalities of data portability. Because data flows cross-countries, the need for a coherent regulation of portability is strong. Nonetheless, views on the modalities of a portability regime differ across states. The type of data it should cover, the concrete definition of “portability”, the extent of interoperability required, the kinds of standardization of formats and interfaces, and whether retrieval “in a commonly used and machine-readable format” suffices are some of the many questions on which consensus should be reached. In this regard, the first experiences with the GDPR and NPDR will be crucial in determining the future of portability.
III Conclusion: Toward an Alternative Concept of Data Portability
The EU regulations regarding data portability have an ambitious aim: to contribute to the creation of an effective data ecosystem characterized by the free flow of data. Both regimes, however, were designed to address very specific situations – the GDPR regime for users and their free-of-charge social media provider; the NPDR regime for business customers and their big data analytics providers. Both regimes find application beyond the use cases they were designed for. Instead of distinguishing between personal and non-personal data, a better regime for data portability could hinge on whether the value of generated data serves as compensation for the respective service providers’ costs.
Ultimately, the distinction between personal and non-personal data can be challenged as inappropriate for data portability. Data portability is a concept that primarily serves the free flow of data rather than the protection of personal data. A classification distinguishing between raw and generated data has its advantages, particularly when it factors in the value of data. Competition law could rely more heavily on the value of data and its role in providing cost compensation, instead of using a terminology inherited from data protection law. Future data portability regimes may be better designed once they are removed from the realm of data protection. This assumes that the data subject is sufficiently protected by the remaining rights under the GDPR, especially via the right of access. Guaranteeing an effective right of access for raw and generated data is key.
Consequently, we propose that the difference in choice of the regulatory regime for data portability should be made with a view to the value of data and depending on whether it provides compensation for cost-bearing. Raw data, being assigned to the customer or the data subject, would be portable, while generated data would require a more refined regime depending on whether it serves as a means of compensation.