Hostname: page-component-745bb68f8f-b6zl4 Total loading time: 0 Render date: 2025-01-26T00:34:15.920Z Has data issue: false hasContentIssue false

Birds of a feather? Mis- and dis-information on the social media platform X related to avian influenza

Published online by Cambridge University Press:  09 January 2025

Lauren N. Cooper*
Affiliation:
Clinical Informatics Center, University of Texas Southwestern Medical Center, Dallas, TX 75390 USA
Marlon I. Diaz
Affiliation:
Clinical Informatics Center, University of Texas Southwestern Medical Center, Dallas, TX 75390 USA Paul L. Foster School of Medicine, Texas Tech University Health Sciences Center, El Paso, TX 79430 USA
John J. Hanna
Affiliation:
Clinical Informatics Center, University of Texas Southwestern Medical Center, Dallas, TX 75390 USA Division of Infectious Diseases and Geographic Medicine, Department of Internal Medicine, University of Texas Southwestern Medical Center, Dallas, TX 75390 USA Information Services, ECU Health, Greenville, NC 27834 USA Brody School of Medicine, Department of Internal Medicine, East Carolina University, Greenville, NC 27834 USA
Zachary M. Most
Affiliation:
Department of Pediatrics, University of Texas Southwestern Medical Center, Dallas, TX 75390 USA Peter O’Donnell School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX 75390 USA
Christoph U. Lehmann
Affiliation:
Clinical Informatics Center, University of Texas Southwestern Medical Center, Dallas, TX 75390 USA
Richard J. Medford
Affiliation:
Clinical Informatics Center, University of Texas Southwestern Medical Center, Dallas, TX 75390 USA Division of Infectious Diseases and Geographic Medicine, Department of Internal Medicine, University of Texas Southwestern Medical Center, Dallas, TX 75390 USA Information Services, ECU Health, Greenville, NC 27834 USA Brody School of Medicine, Department of Internal Medicine, East Carolina University, Greenville, NC 27834 USA
*
Corresponding author: Lauren N. Cooper; Email: [email protected]

Abstract

Objective:

Social media has become an important tool in monitoring infectious disease outbreaks such as coronavirus disease 2019 and highly pathogenic avian influenza (HPAI). Influenced by the recent announcement of a possible human death from H5N2 avian influenza, we analyzed tweets collected from X (formerly Twitter) to describe the messaging regarding the HPAI outbreak, including mis- and dis-information, concerns, and health education.

Methods:

We collected tweets involving keywords relating to HPAI for 5 days (June 04 to June 08, 2024). Using topic modeling, emotion, sentiment, and user demographic analyses, we were able to describe the population and the HPAI-related topics that users discussed.

Results:

With an original pool of 14,796 tweets, we analyzed a final data set of 13,319 tweets from 10,421 unique X users, with 50.4% of the tweets exhibiting negative sentiments (< 0 on a scale of −4 to +4). Predominant emotions were anger and fear shown in 36.4% and 29.5% of tweets, respectively. We identified 5 distinct, descriptive topics within the tweets. The use of emotionally charged language and spread of misinformation were substantial.

Conclusions:

Mis- and dis-information about the causes of and ways to prevent HPAI infections were common. A large portion of the tweets contained references to a planned epidemic or “plandemic” to influence the upcoming 2024 US presidential election. These tweets were countered by a limited number of tweets discussing infection locations, case reports, and preventive measures. Our study can be used by public health officials and clinicians to influence the discourse on current and future outbreaks.

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of The Society for Healthcare Epidemiology of America

Introduction

As news reports of highly pathogenic avian influenza (HPAI) infections in farm animals such as cows and chickens and of transmission to humans increased in the spring of 2024, so did the “bird flu” chatter on social media outlets such as X (formerly Twitter), Instagram, and TikTok. With its proliferation in the 21st century, social media has become an important tool to monitor disease outbreaks and users’ perceptions, allowing public health officials and researchers to determine public sentiment on emergent disease threats and potential interventions.Reference Wilson, Lehmann, Saleh, Hanna and Medford1 As seen in recent public health emergencies such as the coronavirus disease 2019 (COVID-19)Reference Medford, Saleh, Sumarsono, Perl and Lehmann2Reference Lanier, Diaz, Saleh, Lehmann and Medford9 pandemic and outbreak of mpox,Reference Cooper, Radunsky and Hanna10,Reference Leslie, Okunromade and Sarker11 analyzing social media posts on outbreaks can provide vital insight into the public’s response to infection prevention methods such as testing and vaccinations, the handling of infections already present, the spread of mis- and dis-information, and how government and politics play into current and future public health emergencies.

On June 05, 2024, the World Health Organization (WHO) confirmed the first death of a man in Mexico with H5N2 avian influenza infection,Reference Organization12 although it was later confirmed that the death was not caused by the H5N2 virus but by existing co-morbidities.Reference Organization13 Using this event as a point of reference due to the increase in media coverage, we utilized X to collect tweets focused on the topic of HPAI. By analyzing the tweets using topic modeling, sentiment, and emotion analyses, we hypothesize that we can provide the public health community and government officials useful insights into the fears, concerns, reactions, and possible mis- and dis-information distributed by X users regarding the current HPAI outbreak.

Methods

Data collection and preprocessing

We collected all English-language original tweets using keywords indicative of HPAI including the following terms: “H5N1, H5N2, bird flu, bird influenza, avian flu, avian influenza, A(H5N1), A(H5N2), and influenza A virus” using the X application programming interface v214 and the Tweepy Python library v4.14.0.Reference Tweepy15 We chose X as the social media platform of choice due to the availability of numerous Python libraries for the platform as well as the use of short texts, compared to visual media such as images or video of other platforms. The tweets were collected from June 04, 2024, to June 08, 2024, a time span which included the news of the first human fatality associated with HPAI in Mexico along with numerous news outlets and media discussing the ongoing H5N1 HPAI variant outbreak in the United States. Various metadata were also collected such as the author’s username, self-reported location, whether the author was “verified,” the author’s user description, and counts of likes, retweets, impressions, and quotations for each tweet. All data collection, processing, and analyses were conducted using the Python programming language (version 3.10.5).16

To prepare the data for analysis, we preprocessed by removing duplicate tweets, embedded URLs, emojis, common symbols such as “#” and “@,” and expanding common contractions. The plain text of the tweets was then cleaned further using the natural language processing library spaCy.17 Next, we removed common stop words and created bigrams and trigrams using the Gensim libraryReference Řehůřek18 to prepare for topic modeling.

Analyses

Topic modeling

From the Gensim library, we utilized a latent Dirichlet allocation (LDA) model estimation algorithm to perform topic modeling. Using a corpus based on the trigrams derived from our dataset, we trained LDA models comprising topic numbers from 1 to 40. With Umass coherence scores used to quantitatively determine the optimal number of topics, we ultimately chose a model with 5 topics. Combining the top 20 keywords for each topic with the respective tweets, we utilized OpenAI’s ChatGPT-419 large language model (LLM) to determine appropriate descriptions for each of the 5 topics using the following prompt: “Using the uploaded csv file, can you please give a description of the 5 topics (0-4) using the given keywords, tweets, and percentage of contribution to each topic?”

Sentiment and emotion analysis

In addition to topic modeling, the sentiment and emotion of the tweets were analyzed to provide additional insight into the public opinions about HPAI. The sentiment of each tweet was determined using the SentiStrength library.Reference Thelwall, Buckley, Paltoglou, Cai and Kappas20,Reference Hung21 With values ranging from −4 for extremely negative sentiments to +4 for extremely positive sentiments, the SentiStrength library is optimized to perform sentiment analysis on short, informal text such as those in tweets and other forms of social media. To determine the emotion present in the tweets, we used the Text2Emotion library,Reference Aman Gupta, Sharma and Bilakhiya22 which uses natural language processing techniques to determine words in the text that express emotion. The text was then categorized, based on probability scores, into 5 emotions: happy, angry, surprise, sadness, and fear.

User demographics

Although demographics such as age, sex, race, and ethnicity are generally not readily available from tweets, these data can be inferred from user profiles, usernames, profiles, and user images. Using the machine learning-based M3-Inference library,Reference Wang, Hale and Adelani23,24 we determined a user’s age range (≤ 18, 19–29, 30–39, ≥ 40), likely binary gender (female or male), and whether a user was “an organization,” meaning the user is likely not an individual person, but rather a corporate entity such as a news organization or company. A user’s ethnicity (Hispanic, non-Hispanic White, non-Hispanic Black, or Asian) was inferred using the Ethnicolr25 library. This library uses the user’s first and last name to help determine an accurate ethnicity based on the state of Florida’s voting registration data and US census data.

Ethical approval

All data included in this study were publicly available and therefore patient consent or approval from an institutional review board were not required.

Results

User demographics

During our study period, we collected 14,796 English-language tweets. After preprocessing and the removal of duplicate tweets, we used a final dataset for analyses of 13,319 tweets from 10,421 unique X users. Of the users, 8,150 (77.8%) were individual users and not an organization. Of the users identified as individuals, 5,725 (70.6%) were identified as male, 2,966 (36.6%) were 18 years old or younger, and 2,189 (27%) were 40 years old or older. We were able to identify race in only 84% of individual users (6,846) using the Ethnicolr library, and of those, 78% (5,341) were labeled as non-Hispanic White (Table 1).

Table 1. Demographics of the 10,421 unique X users who created the 13,319 tweets from our data set

a We were able to identify race in only 6,846 individual users using the Ethnicolr library.

Sentiment and emotional analyses

The majority of tweets in our dataset were negative in sentiment. Out of the −4 to +4 scale, 50.4% of the tweets had sentiments less than 0, and the entire data set had a mean sentiment score of −0.68 (1.18). Neutral tweets, those with sentiment scores of 0, accounted for 36.0% of the tweets, while positive tweets (tweets with sentiments greater than 0) only accounted for 13.6% of the dataset (Figure 1).

Figure 1. Sentiment analysis of tweets with each sentiment category, ranging from the most negative (−4) to the most positive (+3).

Of the 5 emotions that could be identified by the Text2Emotion library: happiness, anger, surprise, sadness, and fear, most of the emotions identified in the tweets were negative. Anger (36.4%), fear (29.5%), and sadness (18.5%) comprised most of the tweets exhibited by language such as “F[censored] YOUR BIRD FLU PROPAGANDA,” “Biden et al. are allowing deadly H5N1 to spread unabated! H5N1 has a 50%–60% fatality rate in people-this is catastrophic!,” and “I’m very nervous about H5N1 (and now H5N2) and I’m having the same bad feeling I had before the lockdowns…” In contrast, surprise and happiness made up 9.6% and 5.9%, respectively (Figure 2) as shown by tweets such as “Grateful for health officials in Mexico for transparency,” “Really nice map showing where H5N1 has been detect[ed] in mammals in the U.S.,” and “San Francisco leads with advanced #H5N1 surveillance in wastewater. Commendable efforts!”

Figure 2. Emotion analysis of tweets based on 5 emotions: happiness, anger, surprise, sadness, and fear.

Topic modeling

Utilizing the LDA algorithm and coherence measures, we chose a model with 5 distinct topics (Table 2) to describe the overall themes shown in our data set. Included in these topics was the largest topic containing 3,290 (24.7%) tweets discussing issues with infection controls, spread, and infection reporting as shown by key terms including “confirm,” “infect,” and “spread” and tweets such as “US government enhances protective measures against H5N1 virus in dairy cattle @USDA…” One topic of note (2,326 tweets, 17.5%), entitled “Concerns about Health-Related Misinformation and the Need for Accurate Information,” included key terms such as “scam,” “hoax,” “plandemic,” and “election.” These tweets demonstrated dramatic reactions to the current avian influenza outbreak such as “I wonder when there will be dancing nurses for the upcoming Bird Flu hoax? DO NOT COMPLY” and “BIRD FLU IS NOT A THING! THEY JUST WANT TO TAKE AWAY YOUR FOOD, YOUR FARMS, YOUR STABILITY, YOUR JOBS, YOUR WAY OF LIFE, YOUR FREEDOM!” Other topics in the data set include “personal reactions to a possible avian influenza crisis,” “public sentiment and readiness towards avian influenza,” and “the public’s responses to health measures and policies by health authorities.”

Table 2. Topic modeling of tweets. Topic labels were generated by OpenAI’s ChatGPT-4, based on topic keywords, representative tweets, and the percentage of contribution of tweets to the topic model

Discussion

As social media becomes engrained into everyday life, it has become a source of news and information for many. An estimated 50% of US adults receive their news from social media at least some of the time,Reference Liedke and Wang26 the distribution of mis- and dis-information can have devastating effects. Anti-vaccination movements, pseudoscience-based treatments, and inferior infection control efforts for infectious diseases such as COVID-19 and human papillomavirus (HPV) have spread on social media in recent years.Reference Naeem, Bhatti and Khan27Reference Gisondi, Barber and Faust29

In conjunction to the spread of health misinformation, the use of rhetoric such as “DO NOT COMPLY,” “lock us down again,”, the “weaponization” of HPAI, and that the HPAI outbreak is a “hoax”, “plandemic,” or “scamdemic” may be fueled by the upcoming US presidential election. As was shown in the emotion analysis of the tweets, 65.9% of the tweets had evidence of anger or fear. The emotional language used in these tweets provides fodder to further spread the mis- and dis-information using negative tones and expressive language reminiscent of political propaganda. The use of this language aims to discredit information presented by clinicians, public health organizations, and government officials, who hope to protect the public from further harm.

However, social media does have positive effects. During the COVID-19 pandemic, the use of social media was helpful to spread public health information and dispel misinformation.Reference Medford, Saleh, Sumarsono, Perl and Lehmann2 Additionally, one study demonstrated the use of social media to improve negative opinions on vaccinations for HPV stemming from mis- and dis-information.Reference Kim, Schiffelbein, Imset and Olson30 Our study did find evidence of the use of the X platform to spread important medical notifications on HPAI infections, news about where the infections are taking place, and measures being taken by health officials to prevent the spread of the disease. This is evident in tweets such as “While the risk of avian influenza infecting #dairy cattle in IL remains low, State Veterinarian Mark Ernst emphasizes the importance of vigilance and biosecurity among farmers,” “USDA: H5N1 now in 80 dairy herds across 9 states,” and “Concerning to see such a large increase in the number of herds infected with #H5N1 #birdflu Re-emphasizing the importance of testing in animals and humans as well as implementing protective measures.” The use of social media, such as X, allows for public health organizations on both global and national levels such as the US Centers for Disease Control and Prevention and the WHO as well as more local organizations such as local health departments and even individual clinician groups to provide the public with fact based, scientifically sound infection prevention and vaccination information in case of further spread of the avian influenza virus.

Our study has several limitations that could limit its ability to be generalized. The first of which is that we collected only English-language tweets because the natural language processing libraries used are trained specifically for English. With the news of the first death associated with the H5N2 variant of HPAI occurring in Mexico, a country in which many speak Spanish, our ability to gain a complete picture of the public’s reaction to this event is limited. Similarly, we did not limit our search to tweets only from Mexico. Location data for tweets are generally poorly collected, with precise locations being turned off in user accounts by default resulting in approximately 1.0% of tweets having a precise geotag and only 30%–40% of tweets having some location information presented in the user profile, according to X documentation.31

Additionally, the Ethnicolr library used to identify the race and ethnicities of the X users in the data set was only able to identify the race of 6,846 users, resulting in probable undercounting. This can be caused by a lack of discernable first and last names in the X user’s profile. In 2022, ownership of Twitter changed, and the company was transformed into X. With that, changes in policies and in the number of X users could have affected how representative our data set was to represent the general public. According to recent research by the Pew Research Center, after the transition to X, 20% of US adults on the platform create approximately 98% of all tweets,Reference Chapekis and Smith32 indicating that only a small portion of the public opinion is represented by this platform.

Through the analyses of over 13,000 tweets discussing the HPAI outbreak in the spring of 2024, we were able to show that a large portion of the tweets represented feelings of anger and fear or included the dissemination of mis- and dis-information about a “plandemic” and urged others “Do Not Comply” with the information presented by government officials about the outbreak. A bright spot in the analyses did show that information important to public health was also presented on X via tweets indicating locations of outbreaks among dairy and poultry farms and reminders urging workers to use preventative measures when working near affected livestock. Future monitoring of social media posts regarding outbreaks of infectious diseases will further deepen our knowledge of how the public will and does respond to public health emergencies.

Acknowledgments

All authors contributed to the design, review, and editing of the manuscript.

Financial Support

This study was supported by the National Center for Advancing Translational Sciences of the National Institutes of Health (UL1 TR003163) (C.U.L.).

Competing Interests

All authors report no conflicts of interest relevant to this article.

Research Transparency and Reproducibility

Due to restrictions from X Corp and the X Developers license, the availability of our data set is limited. Tweet IDs/user IDs are available from the authors with written permission from X Corp.

References

Wilson, AE, Lehmann, CU, Saleh, SN, Hanna, J, Medford, RJ. Social media: A new tool for outbreak surveillance. Antimicrob Steward Healthc Epidemiol 2021;1:e50.CrossRefGoogle ScholarPubMed
Medford, RJ, Saleh, SN, Sumarsono, A, Perl, TM, Lehmann, CU. An “Infodemic”: Leveraging High-Volume Twitter Data to Understand Early Public Sentiment for the Coronavirus Disease 2019 Outbreak. Open Forum Infect Dis 2020;7:ofaa258.CrossRefGoogle ScholarPubMed
Saleh, SN, Lehmann, CU, McDonald, SA, Basit, MA, Medford, RJ. Understanding public perception of coronavirus disease 2019 (COVID-19) social distancing on Twitter. Infect Control Hosp Epidemiol 2021;42:131138.CrossRefGoogle ScholarPubMed
Su, Y, Venkat, A, Yadav, Y, Puglisi, LB, Fodeh, SJ. Twitter-based analysis reveals differential COVID-19 concerns across areas with socioeconomic disparities. Comput Biol Med 2021;132:104336.CrossRefGoogle ScholarPubMed
Diaz, MI, Hanna, JJ, Hughes, AE, Lehmann, CU, Medford, RJ. The politicization of ivermectin tweets during the COVID-19 Pandemic. Open Forum Infect Dis 2022;9:ofac263.CrossRefGoogle ScholarPubMed
Jagarapu, J, Diaz, MI, Lehmann, CU, Medford, RJ. Twitter discussions on breastfeeding during the COVID-19 pandemic. Int Breastfeed J 2023;18:56.CrossRefGoogle ScholarPubMed
Basit, MA, Lehmann, CU, Medford, RJ. Managing pandemics with health informatics: successes and challenges. Yearb Med Inform 2021;30:1725.Google ScholarPubMed
Saleh, SN, McDonald, SA, Basit, MA, et al. Public perception of COVID-19 vaccines through analysis of twitter content and users. Vaccine 2023;41:48444853.CrossRefGoogle ScholarPubMed
Lanier, HD, Diaz, MI, Saleh, SN, Lehmann, CU, Medford, RJ. Analyzing COVID-19 disinformation on Twitter using the hashtags #scamdemic and #plandemic: Retrospective study. PLoS One 2022;17:e0268409.CrossRefGoogle ScholarPubMed
Cooper, LN, Radunsky, AP, Hanna, JJ, et al. Analyzing an emerging pandemic on twitter: monkeypox. Open Forum Infect Dis 2023;10:ofad142.CrossRefGoogle ScholarPubMed
Leslie, A, Okunromade, O, Sarker, A. Public perceptions about monkeypox on twitter: thematic analysis. JMIR Form Res 2023;7:e48710.CrossRefGoogle ScholarPubMed
Organization, WH. Human infection caused by avian Influenza A(H5N2)-Mexico (05 June 2024). 2024. https://www.who.int/emergencies/disease-outbreak-news/item/2024-DON520. Accessed 07/17/2024, 2024.Google Scholar
Organization, WH. Human infection caused by avian Influenza A(H5N2)-Mexico (14 June 2024). 2024. https://www.who.int/emergencies/disease-outbreak-news/item/2024-DON524. Accessed 07/17/2024, 2024.Google Scholar
X API Documentation. https://developer.x.com/en/docs/twitter-api. Accessed 07/15/2024, 2024.Google Scholar
Tweepy, Roesslein J.. 2023. https://docs.tweepy.org/en/stable/. Accessed 07/15/2024, 2024.Google Scholar
Welcome to Python. 2024. https://www.python.org/. Accessed 07/15/2024, 2024.Google Scholar
spaCy - Industrial Strength Natural Language Processing in Python. 2024. https://spacy.io/. Accessed 07/15/2024, 2024.Google Scholar
Řehůřek, R. Gensim: Topic Modeling for Humans. 2022. https://radimrehurek.com/gensim/. Accessed 07/17/2024, 2024.Google Scholar
ChatGPT. ChatGPT. 2024. Large Language Model. https://chatgpt.com/. Accessed 07/02/2024, 2024.Google Scholar
Thelwall, M, Buckley, K, Paltoglou, G, Cai, D, Kappas, A. Sentiment strength detection in short informal text. J American Soc Info Sci Technol 2010;61:25442558.CrossRefGoogle Scholar
Hung, YZ. GitHub - zhunhung/Python-SentiStrength. https://github.com/zhunhung/Python-SentiStrength. Accessed 07/15/2024, 2024.Google Scholar
Aman Gupta, AB, Sharma, Shivam, Bilakhiya, Karan. Text2emotion. https://shivamsharma26.github.io/text2emotion/. Accessed July 22, 2022, 2022.Google Scholar
Wang, Z, Hale, S, Adelani, DI, et al. Demographic Inference and Representative Population Estimates from Multilingual Social Media Data. The World Wide Web Conference; 2019; San Francisco, CA, USA.CrossRefGoogle Scholar
Github-euagendas/m3inference: A deep learning system for demographic inference. 2023. https://github.com/euagendas/m3inference. Accessed 07/17/2024, 2024.Google Scholar
Suriyan Laohaprapanon GSaBN. ethnicolr: Predict Race and Ethnicity From Name. https://ethnicolr.readthedocs.io/ethnicolr.html. Accessed July 22, 2022, 2022.Google Scholar
Naeem, SB, Bhatti, R, Khan, A. An exploration of how fake news is taking over social media and putting public health at risk. Health Info Libr J 2021;38:143149.CrossRefGoogle ScholarPubMed
Suarez-Lledo, V, Alvarez-Galvez, J. Prevalence of health misinformation on social media: systematic review. J Med Internet Res 2021;23:e17187.CrossRefGoogle ScholarPubMed
Gisondi, MA, Barber, R, Faust, JS, et al. A deadly infodemic: social media and the power of COVID-19 misinformation. J Med Internet Res 2022;24:e35552.CrossRefGoogle ScholarPubMed
Kim, SJ, Schiffelbein, JE, Imset, I, Olson, AL. Countering antivax misinformation via social media: message-testing randomized experiment for human papillomavirus vaccination uptake. J Med Internet Res 2022;24:e37559.CrossRefGoogle ScholarPubMed
Advanced Filtering with Geo Data. https://developer.x.com/en/docs/tutorials/advanced-filtering-for-geo-data. Accessed 07/17/2024, 2024.Google Scholar
Chapekis, A, Smith, A. How US Adults on Twitter Use the Site in the Elon Musk Era. 2023. https://www.pewresearch.org/short-reads/2023/05/17/how-us-adults-on-twitter-use-the-site-in-the-elon-musk-era/. Accessed 07/17/2024, 2024.Google Scholar
Figure 0

Table 1. Demographics of the 10,421 unique X users who created the 13,319 tweets from our data set

Figure 1

Figure 1. Sentiment analysis of tweets with each sentiment category, ranging from the most negative (−4) to the most positive (+3).

Figure 2

Figure 2. Emotion analysis of tweets based on 5 emotions: happiness, anger, surprise, sadness, and fear.

Figure 3

Table 2. Topic modeling of tweets. Topic labels were generated by OpenAI’s ChatGPT-4, based on topic keywords, representative tweets, and the percentage of contribution of tweets to the topic model