Introduction
Delivering patient data, such as that stored in electronic health record (EHR) systems, to scientists in a timely, secure, and useable manner is a crucial component of clinical and translational research; these data help investigators form new hypotheses, perform observational and real-world studies, and facilitate clinical trials. Providing these data has been one of the core functions of informatics teams at the National Institutes of Health Clinical and Translational Science Award (CTSA) hubs. The Enterprise Data Warehouse for Research (EDW4R) [Reference Campion, Craven, Dorr and Knosp1,Reference Knosp, Craven, Dorr, Bernstam and Campion2] is the ecosystem that hubs use to deliver this data.
As shown in Fig. 1, EDW4Rs are the technologies and processes CTSA hubs use to deliver patient data for analysis to clinical and translational researchers. They aggregate data from EHR systems as well as other sources including external (e.g., insurance claims, social determinants of health) and internal (e.g., genomics, biospecimens, patient-reported outcomes) data sets. These data are then transformed into a format that is accessible for research and is stored in some protected data repository, which is often a data warehouse, a structured collection of historical data designed for analytics, but may be a data lake, a raw collection of unaggregated data, or a copy of the transactional database that is set aside for research use. These repositories are then queried – often by data team staff or by investigators using self-service tools – and the resulting extracts are used in a number of clinical and translational science uses including but not limited to study feasibility, population health, real-world evidence generation, and sharing in research networks.
In recent years, EDW4R has supported a wider spectrum of research uses going beyond clinical trial recruitment [Reference Claerhout, Kalra and Mueller3] to enable efforts that develop, validate, and disseminate new and increasingly complex algorithms, perform observational studies (e.g., comparative effectiveness meta-studies or phenome-wide association studies [Reference Denny, Bastarache and Roden4]), real-world data analyses (e.g., pharmacovigilance [Reference Kohane, Churchill and Murphy5]), and pragmatic trials [Reference Richesson, Green and Laws6]. EDW4R has also been used to drive multi-institutional data studies through research networks such as PCORnet [Reference Hernandez, Fleurence and Rothman7,Reference Collins, Hudson, Briggs and Lauer8], ACT [Reference Visweswaran, Becich and D'Itri9], and OHDSI [Reference Hripcsak, Duke and Shah10] and to contribute to national-level aggregation of data for specific emergent health crisis [Reference Haendel, Chute and Bennett11].
Our previous work has shown that EDW4R operations vary across the CTSA consortium, and optimal approaches to EDW4R are unknown [Reference Campion, Craven, Dorr and Knosp1,Reference Knosp, Craven, Dorr, Bernstam and Campion2]. This has made it challenging for CTSA hubs to deliver efficient and effective service and support and adapt to ever-changing requirements. To address this challenge, some form of modeling of EDW4R operations is needed to guide the development and tracking of institutional EDW4R processes and organization and compare them to practices at other institutions.
Maturity models [Reference Fraser, Moultrie and Gregory12–Reference Benlhabib and Berrado14] are systematic methods to explain and track how organizations and processes develop. They have been used to provide guidance for a wide spectrum of information technology (IT) and in particular have been used to guide research IT and informatics development [15], clinical analytics [Reference Knosp, Barnett, Anderson and Embi16], and EHR system adoption [Reference Analytics17]. In academic medical centers, Chief Information Officers (CIOs) may be familiar with the Healthcare Information and Management Systems Society (HIMSS) Electronic Medical Record Adoption Model, which has been used by thousands of healthcare organizations to assess and guide digital maturity with respect to implementation of and investment of clinical information systems [Reference Analytics17].
A maturity index [Reference Analytics18] measures organizational capacity to deliver a service, considering multiple factors including culture, policy, and organization. The indices are generally described in order of increasing maturity, with similar level descriptors across models. For instance, a 5-level model is common, with level 1 described as “initial” or “ad hoc” (i.e., unpredictable and poorly controlled processes) and level 5 as “optimizing” (i.e., continual improvement of well-managed processes). In university settings, CIOs may be familiar with the maturity index used by EDUCAUSE (https://educause.edu), a 2000+ member non-profit organization dedicated to advancing higher education through information technology. EDUCAUSE’s Core Data Service [Reference Analytics18] benchmarks organizational IT practices including organizational maturity. Hundreds of academic institutions have used this resource [Reference Analytics18]. The HIMSS and EDUCAUSE maturity measurement experiences suggest that EDW4R operations could benefit from the development of an EDW4R maturity model.
EDW4R operations have unique needs that require specialized skills and collaborations across multiple domains – scientific, clinical, information technology, and research compliance among others – which limit the ability to apply existing models of IT and clinical data performance. To address this gap, we developed and piloted a new EDW4R maturity model.
Methods
Instrument Development
We developed a maturity index that is based on the “hybrids and Likert-like questionnaires” maturity model group as described by Fraser [Reference Fraser, Moultrie and Gregory12] rather than a “capability maturity model” or a “maturity grid” approach. Each item in this index is a statement of best practice, and a respondent scores their organization’s relative performance toward achieving that practice on a five-point scale. The statements are grouped into categories that provide a multi-dimensional set of ratings. These best practice statements, called maturity anchor statements, aim to define the “Optimized” state for different characteristics identified across each category. This type of maturity index, which is used in the EDUCAUSE Core Data Service [Reference Analytics18] upon which we modeled our index, allows two institutions to be at the same level of maturity but to have achieved it in different ways, which reflects EDW4R operational variation previously observed across the CTSA consortium [Reference Campion, Craven, Dorr and Knosp1,Reference Knosp, Craven, Dorr, Bernstam and Campion2].
Leveraging our prior analysis of 40 interviews with CTSA hubs [Reference Campion, Craven, Dorr and Knosp1,Reference Knosp, Craven, Dorr, Bernstam and Campion2] as well as discussions at more than 20 CTSA EDW4R working group (WG) meetings with informatics leaders, we developed a maturity index by identifying several categories that reflect topical activities for EDW4R operations. These categories were created initially from early discussions with EDW4R WG members and evolved as we completed two rounds of interviews and subsequent analysis of the interview summaries. The final categories were reviewed with the EDW4R WG and modified based on their feedback. The six categories were as follows: access and outreach (researcher engagement and access strategies); service management (level of formalization of services); workforce (staffing characteristics of EDW4R team); relationship to enterprise IT (engagement of EDW4R team with central IT units); data governance (decision-making about EDW4R); and metrics (measurement of EDW4R operations).
Within each category, we developed four to six maturity anchor statements [Reference Hripcsak, Duke and Shah10], descriptions of best practices that are ratable using a Likert scale. We reviewed the statements with our EDW4R WG, addressing statement clarity and removing any ambiguous statements from the pool. The index we developed consisted of six categories with 33 maturity anchor statements total.
Data Collection and Analysis
Using REDCap [Reference Grajek19], we created a survey with a Likert scale rating for the 33 anchor statements (REDCap data dictionary of the index is available at https://bit.ly/EDW4R_Maturity). For each statement in the survey, “strongly agree” responses indicated the highest maturity rating (5) and “strongly disagree” indicated the lowest maturity rating (1) with “agree,” “neutral,” and “disagree” as options ranging from 4 to 2, respectively. All statements also had the “not applicable” response option. We invited informatics leads from CTSA hubs via email to complete the survey, which was available from January 1, 2021, through February 28, 2021. Within and across categories, we then determined descriptive statistics, including median (interquartile range (IQR)).
Results
Fifteen CTSA hubs (more than 25% of the total CTSA hubs in the USA) completed the survey assessing maturity of EDW4R operations. Fig. 2 shows maturity scores for the six categories colorized by each institution. Notably, respondents rated workforce as most mature (4.17 [3.67–4.42]) followed by access and outreach (4.00 [3.75–4.50]), metrics (3.83 [3.42–4.42]), service management (3.75 [3.50–4.13]), and data governance (3.67 [3.50–4.25]); relationship with enterprise IT was the least mature (3.00 [2.80–3.80]). Overall maturity was 3.72 [3.55–4.09].
Table 1 shows ratings of maturity anchor statements within each category sorted from highest to lowest median score. Also indicated for each statement is the number of responses, which varied, as some institutions did not respond to certain statements or selected “not applicable” in their response.
IT, Information Technology.
Discussion
Fifteen CTSA hubs participated in a pilot study of a new maturity index addressing six categories of EDW4R operations. Findings show variation in EDW4R operational maturity with a median (IQR) of 3.72 [3.55–4.09], indicating a potential baseline for tracking and comparing EDW4R maturity over time. The category results showed that participating institutions are most mature in workforce and access and outreach and least mature in relationship with enterprise IT. These results provide guidance into areas where individual institutions and the CTSA community can potentially focus to improve EDW4R operations.
The scores of the individual maturity anchor statements indicate specific areas where the participant group has growth opportunities, even in categories where the group’s maturity score is high. For example, in Table 1, statement 1.6 (“We have identified training programs as pipelines to fill open positions on our EDW4R team”) has a score of 3 [2.54–4], which is low compared to other statements in the category. This indicates that the participating institutions have an opportunity to grow in their workforce maturity to identify better recruitment pathways for EDW4R staffing.
The EDW4R index is built on our prior qualitative work that helped illustrate the complexity of EDW4R operation. The current study extends prior work by providing a method to quantify EDW4R complexity to support assessment across organizations and management of individual institutions. The maturity method we used, category-based maturity anchor statements, appears to appropriately address the current state of variable EDW4R development at CTSA hubs.
In non-healthcare sectors of the economy, data warehouse maturity has been well documented by Sen [Reference Harris, Taylor, Thielke, Payne, Gonzalez and Conde20] who described a capability maturity model for data warehouse development and services that assume a typical enterprise level of investment in data warehouse operations. Our observations are that EDW4R development and services at CTSA hubs are an evolving set of processes driven by the needs of clinical and translational researchers and correlation between EDW4R operations and the data warehousing process maturity described by Sen is limited. There are lessons for CTSA hubs to learn from Sen’s model such as the separation of data warehouse development maturity from operations maturity and the focus on maturing processes. Currently, Sen’s model does not address many of the clinical and translational science aspects specific to EDW4R operations, such as biomedical research expertise involved; scientific outcomes; multi-departmental organizational collaborations needed for successful EDW4R operations (e.g., relationship with enterprise IT); and variations in organizational culture of academic medical centers.
There are limitations to the current investigation. Our study with the maturity index is a pilot, and further validation is required to establish the index as a standard tool for measuring EDW4R operations. The index is a proposed set of best practices, but it does not directly outline a path of development – which practices to implement first, which can wait until later. While the maturity index reflects our prior qualitative work, an optimal maturity index provides a roadmap for developing whatever resource is being assessed. All maturity index scores in this pilot were self-reported and many of the anchor statements used are subjective, which might encourage bias toward higher scores to maintain institutional reputation. In future applications of this index, normalization across the sample group could be done by selected external reviewers to reduce this bias.
This work is a logical next step for previous work [15] done by one of the authors (BK) and collaborators that proposed the use of maturity models to assess research information technology and informatics in academic medicine. The current EDW4R study describes an assessment that builds from the value proposition described in the prior research informatics maturity work, including identification of gaps in aligning clinical needs with research, optimizing cross-organizational research environments, developing guidelines to participate in emerging communities of practice, and enabling strategic review of local expertise and infrastructure capabilities.
Next steps for this work are to operationalize the index as a practice within the CTSA community, making it a yearly or every-other-year activity. Validating the index through reviewing results with participating institutions and engaging subject matter experts to review the results are also important next steps.
Our pilot of a novel maturity index shows a baseline quantitative measure of EDW4R functions across 15 CTSA hubs. The maturity index may be useful to faculty and staff currently leading an EDW4R by creating opportunities to explore the index in local context and comparison to other institutions.
Acknowledgements
The authors would like to thank colleagues in the CTSA Informatics Enterprise Committee EDW4R Working Group for their feedback and input to development of this model.
This project was supported by the National Institutes of Health’s National Center for Advancing Translational Sciences Clinical and Translational Sciences Award program, including grant numbers UL1TR002537 (Iowa), UL1TR002369 (OHSU), and UL1TR002384 (Weill Cornell). This work was funded in part by the University of Rochester Center for Leading Innovation and Collaboration under Grant U24TR002260.
Disclosures
The authors have no conflicts of interest relevant to the topic of this manuscript.