Book contents
- Frontmatter
- Contents
- Preface
- Part I Background
- Part II Applications, tools, and tasks
- Interlude — Good practices for scientific computing
- Chapter 17 Research record-keeping
- Chapter 18 Data provenance
- Chapter 19 Reproducible and reliable code
- Chapter 20 Helpful tools
- Part III Fundamentals
- Conclusion
- Bibliography
- Index
Chapter 18 - Data provenance
from Interlude — Good practices for scientific computing
Published online by Cambridge University Press: 06 June 2024
- Frontmatter
- Contents
- Preface
- Part I Background
- Part II Applications, tools, and tasks
- Interlude — Good practices for scientific computing
- Chapter 17 Research record-keeping
- Chapter 18 Data provenance
- Chapter 19 Reproducible and reliable code
- Chapter 20 Helpful tools
- Part III Fundamentals
- Conclusion
- Bibliography
- Index
Summary
This chapter covers data provenance or data lineage, the detailed history of how data was created and manipulated, as well as the process of ensuring the validity of such data by documenting the details of its origins and transformations. Data provenance is a central challenge when working with data. Computing helps but also hinders our ability to maintain records of our work with the data. The best science will result when we adopt strategies to carefully and consistently record and track the origin of data and any changes made along the way. For instance, we want to know where (by whom) a dataset was created and what was the process used to create it. Then, if there were any changes, such as fixing erroneous entries, we need to have a good record of such changes. With these goals in mind, we discuss best practices for tracking data provenance. While such practices generally take time and effort to implement, making them seem tedious in the short term, over time, your research will become more reliable, and you and your collaborators will be grateful.
Keywords
- Type
- Chapter
- Information
- Working with Network DataA Data Science Perspective, pp. 289 - 292Publisher: Cambridge University PressPrint publication year: 2024