Hostname: page-component-f554764f5-sl7kg Total loading time: 0 Render date: 2025-04-20T06:34:16.297Z Has data issue: false hasContentIssue false

77 Best practices for data management and metadata creation for collaborative biostatistics teams

Published online by Cambridge University Press:  11 April 2025

Kelsey Karnik
Affiliation:
University of Kentucky
Maggie Lang
Affiliation:
University of Kentucky
Emily Slade
Affiliation:
University of Kentucky
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Objectives/Goals: Our goal is to enhance communication and documentation in collaborative biostatistics by refining data management and metadata processes. We aim to capture critical data collection and generation information, improve transparency and reproducibility, and foster stronger researcher partnerships for more effective collaborations. Methods/Study Population: Traditional statistical analysis plans (SAP) often miss essential contextual knowledge from collaborators, leading to gaps that hinder reproducibility and limit future data use. Biostatistics teams at the University of Kentucky have updated their strategies to better capture important details about data origins and collection processes. By focusing on clear, comprehensive documentation early in the research process, we aim to preserve foundational data insights and improve collaboration efficiency. Our Biostatistics, Epidemiology, and Research Design (BERD) team has established best practices for addressing data management structures with collaborators across medical and healthcare fields – covering all project stages, from initial data collection to metadata creation and dataset finalization. Results/Anticipated Results: We will detail the processes used to improve data management structures and the observed results of these processes. For example, initiating deeper discussions about data origins and collection processes as early as possible in the collaboration has resulted in a more comprehensive project narrative that lays the foundation for effective collaboration. By engaging with project leaders early in the process, we can confirm that critical details about how data were collected and processed are documented, improving both the transparency and reproducibility of research findings. Streamlining the processes of capturing this information makes it more accessible and useful for those with limited statistical backgrounds, which is particularly relevant for faculty and staff in BERD communities and Clinical and Translational Science Awards Programs. Discussion/Significance of Impact: Nuanced data documentation structures are crucial for transforming raw data into meaningful, reusable datasets. Our initiatives promote clear communication, enhanced efficiency, and streamlined workflows. Translational science researchers can benefit from improving data management and metadata to boost long-term collaborative success.

Type
Biostatistics, Epidemiology, and Research Design
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
Copyright
© The Author(s), 2025. The Association for Clinical and Translational Science