Considerable progress in explaining cultural evolutionary dynamics has been made by applying rigorous models from the natural sciences to historical and ethnographic information collected and accessed using novel digital platforms. Initial results have clarified several long-standing debates in cultural evolutionary studies, such as population origins, the role of religion in the evolution of complex societies and the factors that shape global patterns of language diversity. However, future progress requires recognition of the unique challenges posed by cultural data. To address these challenges, standards for data collection, organisation and analysis must be improved and widely adopted. Here, we describe some major challenges to progress in the construction of large comparative databases of cultural history, including recognising the critical role of theory, selecting appropriate units of analysis, data gathering and sampling strategies, winning expert buy-in, achieving reliability and reproducibility in coding, and ensuring interoperability and sustainability of the resulting databases. We conclude by proposing a set of practical guidelines to meet these challenges.