Book contents
- Frontmatter
- Contents
- List of figures, tables and boxes
- Notes on contributors
- Foreword
- Preface
- General introduction
- Part I How data are changing
- Part II Counting in a globalised world
- Part III Statistics and the changing role of the state
- Part IV Economic life
- Part V Inequalities in health and wellbeing
- Part VI Advancing social progress through critical statistical literacy
- Epilogue: progressive ways ahead
- Index
4 - Social media data
Published online by Cambridge University Press: 30 April 2022
- Frontmatter
- Contents
- List of figures, tables and boxes
- Notes on contributors
- Foreword
- Preface
- General introduction
- Part I How data are changing
- Part II Counting in a globalised world
- Part III Statistics and the changing role of the state
- Part IV Economic life
- Part V Inequalities in health and wellbeing
- Part VI Advancing social progress through critical statistical literacy
- Epilogue: progressive ways ahead
- Index
Summary
Introduction
As the ‘participatory’ Web 2.0 model has supplanted ‘publication’ on the World Wide Web, several rapidly evolving sites and applications, such as Twitter, Facebook, Flickr, Wikipedia and YouTube, have promoted the creation and enabled, to varying extents, the retrieval of increasingly large volumes of user-generated content. Some of these human-made digital artefacts consisting of text, shared web links, audio, image or video files are publicly posted allowing widespread, although seldom free, access to potentially huge volumes of material. Social media data are rarely numerical, but many statistical techniques are now deployed to analyse these newfound sources of ‘Big Data’. Chang and colleagues (2014) have suggested that a ‘paradigmatic shift’ has resulted from these technological advances, leading to a new type of computational social science, a development which, relying largely on quantitative and inductive methodologies, has not been universally welcomed (Fuchs, 2017a; Wyly, 2014). This chapter describes the characteristics of social media data, methods of data collection and analysis and argues that, with several inherent peculiarities, social media data must be embraced, but approached cautiously, by statisticallyminded researchers.
Social media big data
Characteristics
Social media datasets are widely accessed and used in government, corporate and academic environments. Applications include the surveillance and monitoring of citizens (Fuchs, 2017b), business brand and reputation management (Grabher and Konig, 2017) and wide-ranging investigations in social and information systems research (Kapoor et al, 2018). Many digital records of human societal interaction, typically sourced from the billions of messages created every day by users of popular online social networks such as Facebook and Twitter, are now accessible. Social media data are time-stamped, allowing temporal sequencing while individual records are often packaged for access, with metadata, in one of the ‘semi-structured’ interchange formats of the web, such as XML or JSON, not always familiar to statisticians. Some social media data, for example Flickr images or Twitter tweets, hold Latitude and Longitude coordinates allowing straightforward mapping of ‘geotagged’ phenomena. Key demographic or address information, including age, sex, street, town or postcode are not, for privacy reasons, available in social media data although some, such as gender, may be imputed with varying levels of success by examining language usage in text. Exceptionally, where users grant ‘read access’ to third-party social media applications, these variables may become visible to ‘app’ developers.
- Type
- Chapter
- Information
- Data in SocietyChallenging Statistics in an Age of Globalisation, pp. 47 - 60Publisher: Bristol University PressPrint publication year: 2019