In 2011, the U.S. National Academies published a report that called for an “Information Commons” and “Knowledge Network” to enhance understanding of disease and normal biology and to inform a learning health care system. The report became a major influence on President Obama's 2015 State of the Union address proposing a “Precision Medicine Initiative,” which in turn gave rise to the current All of Us Research Program spearheaded by the U.S. National Institutes of Health. Two central ideas guiding the National Academies' analysis were that rapid progress would depend on the continuous contribution of molecular, environmental, and health record/outcomes data linked to individuals, and that information generated from the data would need to be widely available for multiple uses. Although it is clear that the Information Commons and the Knowledge Network connecting its elements are inextricable, each is composed of discrete units with its own funding and support structures.
Even as U.S. efforts to implement the National Academies report were gathering steam, we began a project, “Building the Medical Information Commons (MIC),” to study the process. We somewhat naively initiated this project in search of the rules that would govern the MIC. The vision of a single, integrated, global MIC captures the goal of sharing and linking data so that it can be transformed into information, and ultimately knowledge, in order to advance science, enhance clinical decision-making, and improve public health. However, our conception of an MIC evolved to accommodate both the data structures (plural) and the networks (also plural) connecting them. We no longer believe that it is realistic to expect a unitary commons with a single set of data-sharing practices or uniformly-defined community of contributors and users. Instead, the MIC is a collection of many different health-related commonses (or common pool resources) that would benefit from the widespread adoption of a group of high-level but flexible principles.Reference Deverka, Majumder, Villanueva and Anderson1
Indeed, we have found that the notion of a single, integrated, global MIC is unrealistic for at least three reasons. First, it belies a necessary plurality of collectively-managed resources in many places for many purposes. As a landscape analysis of data-sharing initiatives revealed,Reference Villanueva, Cook-Deegan, Koenig and Deverka2 there are many sources of data, many users of data, and many research and health care institutions pursuing data-sharing functions that are only somewhat aligned. Some sites store data, while others curate data, aggregate data, broker access to data, communicate the meaning of data, and interpret the clinical significance of data.Reference Bollinger, Sanka, Dolman and Liao3 Disparate actors with different roles work to collect and manage the data and build the networks that make the data useful for biomedical research, clinical care, and public health.
Second, the complexities of moving masses of data make it unrealistic to expect a single, integrated, global MIC. Technologically, the size and scope of data make them both difficult and expensive to move. There are initiatives that still move data to a central repository, or mirror data among databases. For example, the three international human DNA sequence databases (GenBank, the European Nucleotide Archive, and the DNA Database of Japan) started sharing most of their data in the 1980s. Sequence data are copied and stored in each database.4 However, the future is mostly about the creation of federated systems where data and their analysis remain local because it is technologically easier and much cheaper to leave the data in place, do local analysis, and export only the parts of the data relevant to a scientific question or clinical inquiry.
Sociologically and ethically, medical and other personal data are widely regarded as highly sensitive and private. It will thus be important to keep data secure and to ensure that their use respects the terms of the consent or authorization (and cultural expectations) under which they were contributed. Economically, data are increasingly perceived as valuable sources for economic growth. Private firms are building data sets, and many countries have passed laws constraining the export of human genetic resources — often including data as well as samples.Reference Rothstein, Knoppers and Harrell5 Some of these laws are intended in part to prevent “biocolonialism” — the extraction of biological resources including samples, DNA, and data from Indigenous and marginalized peoples by settler colonial forces.Reference Garrison6 Global laws and norms are also intended to protect the rights and interests of citizens whose genetic and other personal data may be attractive to foreign researchers and companies.Reference Majumder, Cook-Deegan and McGuire7
These technological, sociological/ethical, and economic factors do not, however, preclude the construction of a global network that uses many sources of data to advance understanding and to inform clinical decision-making. A federated system allows data to be left in place while permitting extraction of the information needed to address particular clinical, research, or public health questions. Further, certain kinds of valuable information at low or no risk of re-identification can be deposited in freely available global public databases. ClinVar, for example, contains information about genomic variants associated (or not) with disease, linked to information about the evidence base for variant classification.Reference Rehm, Berg and Plon8 It does not contain individually identifying case-level data; those data remain in the data resources that feed into ClinVar.Reference Landrum, Lee, Benson and Brown9
Third and finally, there is no international research infrastructure or funding mechanism to support a single, integrated, global MIC. Many of the component data sets were built initially through national government-supported research or clinical testing initiatives. Most data are now generated by commercial firms, with aggregation and curation as ancillary functions of an ongoing business operation such as genetic testing, body imaging, or drug development. Some data resources (e.g., GenBank, ClinVar, LOVD, CFTR2, gnomAD) have become global public goods. However, truly global data resources that are funded through a stable international framework are rare. There are a few: the tripartite data-mirroring of DNA sequence data noted above, for example, is an unusual case of sustained international collaboration over four decades. Yet, even here, the funding sources are mainly national (USA and Japan) and regional (Europe). The organized efforts to create a human reference genome through the Human Genome Project, and many other organism-centered reference sequences, have been assembled from international collaborations. But again, these are funded through national funding streams pooled together; there are no large, stable funding structures that are truly international, and no international equivalent of the U.S. National Institutes of Health. Yet we find innumerable examples of global use of data for research, health care, or public health purposes. In sum, for the most part, data use is global but support of the infrastructure is national, or private. The absence of capacity for building robust public international infrastructure may not be a desirable feature, but it does appear to be a practical constraint for the foreseeable future.
The result is many work-arounds through ad hoc international collaborations, national efforts that have international dimensions, and public-private partnerships. Recent years have seen major large-scale efforts to develop standards and facilitate sharing of resources internationally, most notably the Global Alliance for Genomics and Health (GA4GH).Reference Knoppers and Joly10
It would be an exaggeration to call the landscape we have described the Information Commons and Knowledge Network envisioned in the 2011 National Academies' report, but the existing resources could well evolve to serve many of its functions. The biggest point of divergence from the National Academies' vision is that many data commons have emerged within the space we refer to as the MIC with no central administrative authority. Rather, the evolving MIC appears to be a combination of (a) aggregating and curating data generated by completed or ongoing research projects and consortia and establishing new health data-focused initiatives — usually supported by national governments, or public or private health care delivery organizations and technology companies — and (b) observing how well (or poorly) the various models, pilots and major initiatives are meeting the needs of clinicians, researchers, and participants.
This special issue draws together the efforts of a multi-disciplinary research team, community advisory panels, and an international network of experts to think through how best to fulfill the promise of an MIC. An initial meeting of our advisory committee in March 2016 gave rise to eight principles based on case studies and a review of the literature.11 We then proceeded to gather information through mixed empirical methods. In this special issue, we report the results of that research and the further reflection those results inspired. The opening paper by Amy L. McGuire and members of the research team and expert advisory committee represents the culmination of our different lines of inquiry and the pooling of the collective experience and wisdom of advisory committee members. It centers on two main findings related to the sustainability of an MIC: (1) the case for making data resources truly oriented around participants is both ethical and pragmatic, and several dimensions of participant-centricity need attention and heightened visibility, including the participant role in decision-making, and (2) gaining and sustaining trust necessarily entails forming trustworthy institutions and practices that are committed to transparency, give participants access to their own data (and ensure that data are high-quality), invest in security, and understand the importance of accountability.Reference McGuire, Majumder, Villanueva and Bardill12
This special issue draws together the efforts of a multi-disciplinary research team, community advisory panels, and an international network of experts to think through how best to fulfill the promise of an MIC. An initial meeting of our advisory committee in March 2016 gave rise to eight principles based on case studies and a review of the literature. We then proceeded to gather information through mixed empirical methods. In this special issue, we report the results of that research and the further reflection those results inspired.
One line of inquiry contributing to these findings consisted of a landscape analysis centered on systematically gathering information about and classifying data-sharing efforts and facilitators, compiling them in a database, and conducting a detailed review of a subset of efforts that both collect and distribute human genomic and other health-related data (“data-sharing initiatives”). Angela G. Villanueva and other members of the research team first characterize the broader data-sharing landscape, putting forward a typology of data-sharing efforts and facilitators.13 They conclude that diversity contributes to a thriving, value-creating MIC aligned with Findable, Accessible, Interoperable and Re-usable (FAIR) data principles, and they also call attention to the ways in which some actors are challenging tradition research norms (for example, brokers that promise participants payment for their data). Next, Villanueva and colleagues describe the results of a review of the practices of data-sharing initiatives. They focus on transparency (i.e., accessibility of information to the public) related to practices in five domains: consent, privacy and security, data access, oversight, and participant engagement.Reference Villanueva, Cook-Deegan, Robinson and McGuire14 They highlight areas of significant variation across initiatives reviewed, flagging some of these areas for further scrutiny, and make several recommendations with the aim of increasing transparency.
In parallel, members of the research team conducted interviews with expert stakeholders involved in various aspects of data-sharing initiatives from diverse employment sectors (i.e., laboratory, academia, non-government organization, government, technology, and healthcare company). As described by Juli M. Bollinger and colleagues, part of each interview centered on defining an MIC, and identifying what might be done to narrow the gap between interviewees' vision and aspirations of an MIC and current realities.Reference Bollinger, Zuk, Majumder and Versalovic15 Although there was full or near consensus on many points, including the desirability of pluralism and the importance of trustworthiness, views diverged concerning the priority of investing in making an MIC “clinical-grade.” Mary A. Majumder and colleagues address a different question that is critical to working out the meaning of participant-centricity: “What role should the people whose data populate the medical information commons play?”Reference Majumder, Bollinger, Villanueva, Deverka and Koenig16 They found broad agreement on expanding the role of participants but also differences of emphasis between individual and collective forms of engagement. They discuss political aspects of engagement of participants as collectives, including issues of representation, and the significance of efficiency concerns. Interviews also explored who “owns” data, who controls it, and how the medical and legal systems should frame the policy choices. Amy L. McGuire, Jessica Roberts, Sean Aas, and Barbara J. Evans discuss relevant findings and legal and ethical frameworks.Reference McGuire, Roberts, Aas and Evans17 They conclude that ownership may be a useful metaphor, but it is a poor focus for policy development related to an MIC.
Another line of inquiry consisted of convening of community advisory panels in three cities: Raleigh-Durham, NC, Austin, TX, and Oakland, CA. As described by Patricia A. Deverka, colleagues from the American Institutes for Research, and other members of the research team, public deliberations were anchored by hypothetical case studies capturing different possible MICs within the collection that makes up the over-arching MIC (including a case study based on the All of Us Research Program).Reference Deverka, Gilmore, Richmond and Smith18 They present recommendations regarding opt-in consent, transparent data polices, public representation on governing boards, and data security and privacy protection. They also discuss the importance of recognizing and accounting for high levels of public mistrust as the backdrop for efforts to build an MIC.
Finally, three papers round out this special issue, widening the lens to pick up important developments beyond the United States. Bollinger and colleagues use BRCA1/2 data sharing as a case study to capture the global data-sharing picture and concrete obstacles and incentives facing laboratories engaged in clinical genetic testing.19 They conclude that there is strong support for data sharing worldwide but also that the sharing norm is fragile, underlining the importance of addressing obstacles and attending to incentives. Tania Bubela and colleagues examine how MIC principles relate to the concept of a learning health system.Reference Bubela, Genuis, Janjua and Krajden20 In particular, they use MIC initiatives in three Canadian provinces as case studies, and they discuss important lessons learned from each. Finally, Mark Phillips and Bartha M. Knoppers explore the tensions between the open science movement (and related efforts to build information commons) and the movement to strengthen data protection laws, with a focus on the European Union's General Data Protection Regulation.Reference Phillips and Knoppers21 They suggest that prior efforts to reconcile a public right of access to government documents and privacy protections offer hope that these tensions can be worked out.
We believe that, collectively, the papers contained in this special issue represent a body of work that can inform policy and practice in building an MIC. Like the deliberants in the community advisory panels, we are both hopeful and concerned. The promise of the MIC is great. Policies, especially if implemented early while the commons are still under construction, can enable broad use leading to benefits that improve health and improve lives, while ensuring benefits are equitably distributed. We worry that this promise will be undercut by a superficial versus deep commitment to participant-centricity and trustworthiness, by failures to address obstacles and restructure incentives to support data sharing, and by laws and regulations that fall short of what is needed. The danger of steering off course is real when no one is at the helm. We hope this series of articles can initiate a wider conversation leading to a future that is in line with widely shared aspirations and that addresses the concerns.