COVID-19 Vaccines: Characterizing Misinformation Campaigns and Vaccine Hesitancy on Twitter

Vaccine hesitancy and misinformation on social media has increased concerns about COVID-19 vaccine uptake required to achieve herd immunity and overcome the pandemic. However anti-science and political misinformation and conspiracies have been rampant throughout the pandemic. For COVID-19 vaccines, we investigate misinformation and conspiracy campaigns and their characteristic behaviours. We identify whether coordinated efforts are used to promote misinformation in vaccine related discussions, and find accounts coordinately promoting a `Great Reset' conspiracy group promoting vaccine related misinformation and strong anti-vaccine and anti-social messages such as boycott vaccine passports, no lock-downs and masks. We characterize other misinformation communities from the information diffusion structure, and study the large anti-vaccine misinformation community and smaller anti-vaccine communities, including a far-right anti-vaccine conspiracy group. In comparison with the mainstream and health news, left-leaning group, which are more pro-vaccine, the right-leaning group is influenced more by the anti-vaccine and far-right misinformation/conspiracy communities. The misinformation communities are more vocal either specific to the vaccine discussion or political discussion, and we find other differences in the characteristic behaviours of different communities. Lastly, we investigate misinformation narratives and tactics of information distortion that can increase vaccine hesitancy, using topic modeling and comparison with reported vaccine side-effects (VAERS) finding rarer side-effects are more frequently discussed on social media.



page 4


Psycho-linguistic differences among competing vaccination communities on social media

Currently, the significance of social media in disseminating noteworthy ...

From #Jobsearch to #Mask: Improving COVID-19 Cascade Prediction with Spillover Effects

As the pandemic of social media panic spreads faster than the COVID-19 o...

Characterization of Local Attitudes Toward Immigration Using Social Media

Migration is a worldwide phenomenon that may generate different reaction...

Trajectories of Islamophobic hate amongst far right actors on Twitter

Far right actors use the Internet for myriad purposes, such as forming c...

Assessing the influence of French vaccine critics during the two first years of the COVID-19 pandemic

When the threat of COVID-19 became widely acknowledged, many hoped that ...

Characterizing the roles of bots during the COVID-19 infodemic on Twitter

An infodemic is an emerging phenomenon caused by an overabundance of inf...

Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset

From conspiracy theories to fake cures and fake treatments, COVID-19 has...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The COVID-19 pandemic has amplified the concerns surrounding social media communications, which have on one hand facilitated pro-social messaging like “#wearamask", “#staysafe" (SocialMediaCovidRole), but also have been a breeding ground for health and political misinformation and conspiracies (sharma2020covid). While global efforts have been made to rapidly vaccinate people against COVID-19 and prevent the risk of severe illness, there is significant hesitancy and unwillingness to vaccinate in part of the population. massonvaccinate reported a quarter of American adults said they would refuse the vaccine, and 5% were undecided.

Vaccine hesitancy and misinformation on social media and e-commerce platforms has gained much attention in the past few years (cossard2020falling, juneja2021auditing). cossard2020falling studied the Italian vaccine debate finding echo chambers of anti-vaccine and pro-vaccine groups in 2016 on Twitter, with interaction between the communities being asymmetrical, as vaccine advocates ignore the skeptics. Similarly, miyazaki2021strategy found anti-vaccine accounts replying most to neutral accounts using toxic and emotional content. Studies have found the growing debate about merits of vaccination on social media to be accompanied by reduced vaccinations, reduced intent to vaccinate and reappearance of diseases like measles (smith2011parental, cossard2020falling, ortiz2019systematic).

During COVID-19, misinformation, conspiracies and coordinated misinformation campaigns, have been highly prevalent throughout the pandemic (sharma2020identifying, memon2020characterizing). sharma2020identifying found coordinated accounts promoting political conspiracies and anti-mask narratives, memon2020characterizing characterized health misinformation about COVID-19, among other similar studies. Given the widespread misinformation, government agencies such as the CDC, and fact-checking websites, and communication studies (lewandowsky2021covid) have curated lists of COVID-19 vaccine myths on their websites to inoculate the public against vaccine misinformation.

Recent efforts have studied impacts of vaccine misinformation using user study surveys (jolley2014effects, enders2020different, singh2020first, pierri2021impact). They found that misinformation and conspiracies are correlated with increased COVID-19 vaccine hesitancy, and the effects are different across different demographics. Moreover, jamison2020not also found that vaccine opponents share greater proportions of unreliable information.

In this work, we uncover and characterize misinformation campaigns and communities in the COVID-19 vaccine discussion on Twitter and examine their anti-vaccine propaganda towards increasing vaccine hesitancy and reducing vaccination uptakes in the US. We examine the following research questions,

RQ1. Are there hidden coordinated efforts promoting misinformation/conspiracies about COVID-19 vaccines?

We detect coordinated groups in vaccines discussion on Twitter, uncovering hidden influence between accounts from observed activities, by applying a variant of sharma2020identifying and inspect their narratives and agendas.

The identified account groups tweets and behaviours suggest the presence of coordinated efforts to promote a “Great Reset" conspiracy which claims that world leaders orchestrated the pandemic to control the global economy and the pandemic provides an opportunity for reset. The group promotes baseless COVID-19 anti-vaccine misinformation and conspiracies, with anti-social messages (#notocoronavirusvaccines, #livingnotlockdown, #masksdontwork).

RQ2. What are the other misinformation and conspiracy communities, and which communities are most influenced by them?

We use community detection on the retweet graph of COVID-19 vaccine discussion to identify account groups that endorse similar content. We characterize the communities using its top retweeted accounts and tweet features, political leaning, geographical demographics, and proportion of misinformation/conspiracies.

We find the a large anti-vaccine misinformation and conspiracy community (16%) that spans US (48.9%) and UK (27.5%) accounts, and other smaller anti-vaccine communities of different languages (Spanish, French, Italian). Furthermore, the community of right-leaning US accounts (that retweet top Republican accounts) are strongly influenced by the anti-vaccine misinformation and far-right conspiracy groups, and more distanced from mainstream and health news, and pro-vaccine or neutral communities. The rate of misinformation in different US states is negatively correlated with vaccine uptake rates, which are lower in Republican and swing states of the 2020 US Election.

RQ3. How do behaviours of anti-vaccine misinformation and conspiracy communities differ from informational communities in general and specific to vaccine discussions?

We investigate feature distributions of accounts in each prominent community to characterize their behavioural differences and activities.

We find that the anti-vaccine misinformation community is the most vocal in the vaccine discussion even though it has fewer overall tweets, and younger account ages than other communities. And it has a sizeable audience. In contrast, the far-right conspiracy group is more vocal in other discussions than vaccine discussions, and has more networked structure with higher followers-followings. The mainstream news and left-leaning pro-vaccine communities are more vocal than far-right and right communities, but less than anti-vaccine community, in the vaccine discussion.

RQ4. What kinds of narratives on social media and in misinformation tweets have the potential to increase vaccine hesitancy?

We consider five types of narrative distortions in anti-science misinformation and examine how they manifest on social media through correlations of reported vaccine side-effects in CDC VAERS vs. those discussed on social media, and with topic modeling, and news source characterization for vaccine misinformation tweets.

We find that on social media, rarer vaccine side-effects are discussed more frequently, explained by novelty of rarer effects in general tweets, and cherry-picking in misinformation tweets. Also, impossible expectations of vaccines, misinformation on scientific facts, refusal, rollout, and vaccination plan conspiracies were present, with pseudoscience and political propaganda, all of which can contribute to increased COVID-19 vaccine hesitancy in normal users.

2 Data Collection

COVID-19 Vaccine Twitter data.

We use the streaming Twitter API which returns a 1% sample of all tweets filtered by tracked keywords (vaccine, Pfizer, BioNTech, Moderna, Janssen, AstraZeneca, Sinopharm) in real-time, to collect Twitter data related to COVID-19 vaccines. The dataset collection includes tweets from Dec 9, 2020 - April 24, 2021, i.e., just before Pfizer-BioNTech and Moderna were approved by the FDA for Emergency Use Authorization (EUA). The dataset contains 29,743,178 tweets from 7,417,592 accounts. Fig 1 shows the timeline of tweets volume111

Tweets include (i) Original tweets (accounts can create content and post on Twitter) (ii) reply tweets (iii) retweeted tweets, which reshare without comment (iv) quote tweets (embed a tweet i.e., reshare with comment). The tweet payload (metadata of tweet and user account associated with it) can be collected given the Tweet ID using Twitter APIs. The tweet IDS can be shared as a dataset if you reach out to us. During data collection, due to technical interruptions between 2020-01-12 - 2021-01-22, and 2021-02-04 - 2021-02-16, we could were not fully recover the stream of tweets. Therefore for these periods, we recover the tweetids from CoVaxxy (deverna2021covaxxy), a repository similarly tracking vaccine related keywords through the Twitter streaming API. We hydrate the tweetids and filter for keywords tracked in our collection. Note that CoVaxxy does not contain December tweets (its collection starts from Jan 4), which our data collection has. Since the first vaccine EUA in U.S. was granted on Dec 11, we use our collection for the analysis and use CoVaxxy only to supplement our collection during the mentioned technical interruptions.

Figure 1: Tweet volume timeline from the Emergency Use Authorization (EUA) in U.S. till U.S. Adults vaccinations.

CDC VAERS side effects data from vaccination records in the U.S.

Post vaccination side-effects are reported to the FDA/CDC Vaccine Adverse Event Reporting System (VAERS). We download the official records from

(accessed June 6, 2021). Healthcare providers are required to report, while individuals are advised to report post vaccine effects, even if a causal link to the vaccine has not yet been established, in order for FDA/CDC to monitor vaccine safety and conduct further investigations. The reports do not provide exact statistics of vaccine side-effects, but a public account of nation-wide post vaccine effects. We use the data as an estimate to study how post vaccine experiences are discussed on social media in comparison to ones reported to VAERS.

Unreliable/conspiracy URLs news source lists.

We use misinformation as an umbrella term to refer to unreliable claims including false, misleading, non-scientific and conspiratorial narratives. Numerous prior works use low-quality news sources to analyze misinformation shared on social media (Bozarth_Saraf_Budak_2020). Bozarth_Saraf_Budak_2020 found that temporal trends and difference in topics between misinformation and mainstream news were robust to the choice of news source lists. For analysis, we utilize the low-quality news sources reported by three fact-checking resources, as consistently promoting COVID-19 or general misinformation: Media Bias/Fact Check (questionable and pseudo-science/conspiracy lists with low/very low factual rating), NewsGuard (accessed on September 22, 2020), and Zimdars (zimdars2016false) tagged as unreliable or related labels. For mainstream reliable, Wikipedia:Reliable sources/Perennial sources tagged as reliable are used. In total, 124 reliable and 1380 unreliable/conspiracy news sources are obtained.

3 Coordinated Misinformation Campaigns about COVID-19 Vaccines

Figure 2: Example account profiles from identified coordinated group. Left: This account retweets a request to re-follow another account after Twitter Censorship caused it to lose 20,000 followers and promotes anti-vax narratives (I do not consent) and boycott of vaccine passports. Right: This account with 8545 Followers promotes the ‘Great Reset’ conspiracy with anti-vax misinformation suggesting flu shots cause five-fold increase in risk of respiratory infections.
Figure 3: Tweets from a pair of accounts (A, B) in the detected coordinated group. Left: Tweets from the Twitter profile of accounts A and B suggesting anti-lockdown and anti-government narratives. Right: Three example tweets from the collected dataset, of the same pair of accounts (A, B) spreading similar narratives in coordination.

We apply a variant of AMDN-HAGE (sharma2020identifying) to identify suspicious coordinated accounts in the dataset. Misinformation campaigns involving coordinated efforts from groups of malicious accounts towards manipulating public opinion have become increasingly prevalent. AMDN-HAGE (sharma2020identifying) models the hidden influence between accounts using neural temporal point processes (which estimate conditional density of social media posts given previous posts from other accounts), learning the mutual triggering effects between account activities directly from observed social media activities. Here, we apply a variant of the method which can additionally encode domain information to regularize the learning from the observed data, i.e., higher co-occurrence frequencies and overlapping periods of activity between account pairs which might suggest higher likelihood of coordination between the accounts.


We construct observed sequences of accounts’ activities from the diffusion cascade of a tweet, i.e., retweet, reply, quotes of the tweet (direct engagements), and all subsequent engagements (propagation tree), as a time-ordered sequence of posts represented as corresponding to account and posting time . AMDN-HAGE is agnostic to account or tweet features and uses on the posting time and account identities. The extracted sequences contain 62k activity sequences of 31k accounts, after filtering accounts collected less than 5 times in the collected tweets, and sequences shorter than length 10 in the dataset in the early vaccine discussion (Dec - Feb 24).

We applied the method to the observed sequences to identify coordinated groups222

The model was trained on 4 Nvidia-2080Ti GPUs with account embedding dimensions set to 64 implemented in PyTorch with Adam optimizer (1e-3 learning rate, 1e-5 regularization) as in

(sharma2020identifying) The AMDN-HAGE implementation is available at The method is unsupervised and does not use any labeled train or validation set of accounts. The number of groups was selected based on silhoutte score of identified account groups (clusters) from the method. The accounts had maximum silhouette score of 0.043 for 2 groups. The threshold selection for group assignments of accounts (into the two groups) is also unsupervised based on which clustering maximizing the silhouette scores, which was determined to be 0.7 or above for coordinated group.


The method uncovered a suspicious coordinated group of 7k accounts. Based on analysis of the collective behaviours and activities of accounts in the group, we find that it corresponds to a ‘Great Reset’ conspiracy theory. According to BBC news333, the theory claims that a group of world leaders orchestrated the pandemic to take control of the global economy. The baseless conspiracy theory originated after a genuine initiative and annual meeting of the World Economic Forum (WEF) was held in June, 2020, titled ‘The Great Reset’ that explored how countries might recover from the economic damage caused by the coronavirus pandemic. It started trending globally with a video of Canadian Prime Minister Justin Trudeau at a UN meeting, saying the pandemic provided an opportunity for a reset.

Figure 4: Top-35 hashtags of detected coordinated vs. normal accounts.

Account features. Examples of account profiles (Fig 2) from the detected coordinated group shows the ‘Great Reset’ conspiracy and nature of tweets, which is strongly anti-vaccine, anti-lockdown and contains misinformation and conspiracies about the pandemic and vaccines ( 52.56% unreliable/conspiracy URLs in coordinated group tweets compared to 25.6% of normal accounts, although misinformation is not exclusive to coordinated groups).

The identified coordinated group is significantly distinguished from the normal accounts in the comparison of top-35 hashtags in their tweets (Fig. 4, in bold are the non-overlapping hashtags). The coordinated conspiracy group narratives focus on the pandemic being a hoax (#scamdemic2020, #plandemic2020), denouncing the Canandian prime minister Justin Trudeau, from where the Great Reset conspiracy had spurred, and Bill Gates (#trudeaumustgo, #trudeauvaccinefail, #billgates). In terms of the social messaging, the group is anti-vaccine rife with anti-vaccine, anti-masks, and anti-lockdowns (#notcoronavirusvaccines#vaccinesideeffects, #masksdontwork, #livingnotlockdown). In contrast, the top hashtags in normal accounts are more general and tend towards positive social messaging (#wearamask, #healthcare, #staysafe, #washyourhands) towards vaccines and prevention protocols.

Account activities.

We inspected activities of a sample of the detected coordinated accounts. We randomly sampled account pairs that had retweeted at least one common tweet in the observed collected dataset. For a pair of accounts, we checked their Twitter profile and their tweets in the collected dataset. Fig. 3 shows an example account pair (A, B) from the coordinated group, still active on Twitter as of June, 2021; the account names are anonymized here. The tweets of both accounts promoted the same agendas in coordination over similar time periods (either retweeting the same source, or retweeting different sources that independently posted the same content, as shown in the example @NVICLoeDown and @CaliVaxChoice posted exactly the same content and A, B reshared each respectively).

4 Misinformation and Information Diffusion Community Structure

Misinformation or conspiracies are not exclusive to coordinated groups, therefore we also analyze other misinformation communities with graph analysis of the information diffusion structure in this section. The diffusion on social networks occurs when accounts retweet (RT) or re-share tweets other accounts’ tweets. We can construct the RT graph with accounts as nodes and edges between them capturing the count of retweeted tweets between the account pair. As a RT is considered to be a form of endorsement of content (assuming retweets without comments, i.e., we do not include quoted tweets as retweets), the RT graph is often used to cluster accounts with similar opinions (garimella2018quantifying, cossard2020falling).

4.1 Diffusion communities structure and characterization


Similar to prior work (garimella2018quantifying), we use RT edges with minimum count of retweets >= 2, including mutual retweets. We restrict the analysis to accounts appearing at least 5 times in the dataset, to ensure that the collected tweets sampled by Twitter API contain enough information about the account. For the RT graph, we use the 3-core decomposition of the graph, to restrict our focus to accounts that are a part of significant community structuring of the diffusion network and more prone to echo chamber effects from communities they associate with. We split the tweets in the dataset into four quarters by the timeline of collected tweets, and construct the RT graph on first quarter. The decomposition helps to limit the size of the network for easier inspection and visualization of the graph for characterization of the communities. We obtain a graph with 91k accounts, 121k edges, and avg. degree 2.66 before the k-core decomposition. After 3-core decomposition we have 8,974 accounts with 31k edges and an avg. degree of 6.92. We applied Louvain method (blondel2008fast) for community detection and obtained 39 diffusion communities.

Figure 5: Characterization of COVID-19 vaccines information (and misinformation/conspiracy) diffusion communities.
Inferred Political Leaning Low-Quality News URLs
No. Language Locations Left Right Und Conspiracy Unreliable Reliable Others Top News Domains Top Accounts
0(20%) EN (97.4%) US (90.3%) 98.7 0.3 1 0.26 2.87 96.87 92.22 nytimes, washingtonpost, cnn, latimes, politico JoeBiden, KamalaHarris, NYGovCuomo
1(16%) EN (94.0%) US (48.9%), UK (27.5%) 2.8 6.8 90.4 39.61 33.82 26.57 87.56 childrenshealthdefense, dailymail, zerohedge, rt, lifesitenews ChildrensHD, zerohedge
2(16%) EN (95.3%) US (57.5%), India (7.6%), UK (6.1%) 94.4 1.2 4.4 0.94 5.47 93.59 91.54 reuters, theguardian, nytimes, independent, latimes Reuters, NBCNews, AP, CoronaUpdateBot
3(11%) EN (96.9%) US (86.6%) 5.8 83.9 10.3 32.31 35.14 32.55 89.62 truepundit, foxnews, theepochtimes, zerohedge, dailymail Mike_Pence, GOPChairwoman, OANN, nypost
4(6%) EN (84.5%) India (90.3%) 8.2 0.2 91.6 10.27 37.76 51.97 97.92 swarajyamag, indianexpress, dailymail, nationalfile, wsj timesofindia, WIONews, mygovindia
5(5%) EN (96.6%) UK (76.9%), US (13.8%) 64.6* 0.2 35.2 1.28 10.25 88.47 93.12 theguardian, nytimes, telegraph, bbc, express DHSCgovuk, NHSEngland, NHSuk
6(5%) ES (81.2%), EN (13.3%) Argentina (34.3%), Mexico (11.9%) 56.9* 0 43.1 3.48 7.32 89.2 97.68 nytimes, reuters, bbc, theguardian, thetimes ReutersLatam, CoronavirusNewv, AlertaNews24
7(3%) EN (95.6%) Canada (91.0%) 87.8 0 12.2 1.78 4.07 94.15 97.05 nytimes, theguardian, reuters, washingtonpost, bloomberg JustinTrudeau, CBCAlerts, CPHO_Canada
8(3%) EN (97.3%) US (94.8%) 0 97.7 2.3 31.86 45.14 23 85.09 thegatewaypundit, breitbart, foxnews, lifesitenews, zerohedge
9(3%) FR (81.9%), EN (10.6%) France (92.0%) 4 0 96 14.18 62.88 22.94 90.18 francesoir, fr, reseauinternational, childrenshealthdefense, dailymail sputnik_fr, VirusWar, franceinfo, afpfr
10(3%) EN (90.8%) India (84.2%) 92.3 0 7.7 0 2.66 97.34 95.92 nytimes, indianexpress, reuters, theguardian, bloomberg CNBCTV18News
11(2%) EN (79.2%), TL (18.8%) Philippines (85.7%) 100 0 0 7.5 1.25 91.25 98.25 reuters, buzzfeednews, theguardian, nytimes, prevention ANCAlerts, CNNPhilippines
12(2%) EN (53.5%), IT (38.7%) Italy (32.7%), US (20.4%) 34.5 1.9 63.6* 22.22 55.19 22.59 91.28 imolaoggi, zerohedge, rt, dailymail, nytimes RT_com, SputnikInt
13(2%) EN (88.4%) US (30.8%), South Africa (13.8%) 86.9 0.8 12.3 1.79 1.57 96.64 90.33 npr, nytimes, latimes, theguardian, wsj NPRHealth, WHO, EU_Health, CovidSupportSA
14(1%) ES (48.6%), EN (26.0%), NL (18.9%) Netherlands (28.6%), Uruguay (14.3%), Latvia (14.3%) 60.3* 1.8 37.9 49.52 29.61 20.87 90.84 humansarefree, dailymail, rt, childrenshealthdefense, zerohedge
Table 1: Features of COVID-19 information diffusion and anti-vaccine misinformation/conspiracy communities.


We characterize the top-20 diffusion communities that account for 96% of the accounts in the RT graph. Fig. 5 presents the communities with its characterization in Table. 1 based on nature of accounts in tweets in each community. The table includes the community number with size (% accounts) in the RT graph, Language in tweets from the community, geolocation extracted from geo-enabled tweets/reported valid locations in account profiles (dredze2013carmen)

to characterize the general demographic of the community. In addition, we infer the political leaning of accounts using left/right media URLs (as classified by endorsed directly or through retweet structure, similar to

(badawy2019characterizing). We jointly inspect these with the top retweeted accounts and tweets, and distribution of URL news sources, top URL/news domains, and contents in top retweeted and random subset of tweets. The main findings indicate,

  • A large (16.24%) community of Anti-vaccine misinformation and conspiracies. From accounts that have valid locations reported in the profile or tweets, this community spans US (48.9%) and UK (27.5%).

  • Other smaller communities with dominant misinformation or conspiracy content and URLs in their tweets correspond to the U.S. Far-right conspiracy group, that post anti-vaccine content. Another Spanish-English tweets community contains strongly anti-vaccine conspiracies, very similar to the larger Anti-vaccine misinformation community. A French tweets community with relatively less conspiracy content but anti-vaccine, and a Italian tweets community of mixed stance of hesitancy are close to the anti-vaccine misinformation community.

  • Benign communities included Mainstream News (15.58%), Health news (1.48%), U.S. Left Leaning (with Joe Biden, Kamala Harris as top retweeted accounts) with pro-vaccine stance. The former contain accounts with more global geolocations, while the latter was dominantly with US geolocations. There are several regional news andd politics communities centered close to the Mainstream and Health News communities (e.g. UK based with top retweeted accounts corresponding to the National Health Service NHS and Department of Health and Social Care DHSC; Latin America, Philippines News, India and Canada News and Politics). These are identified based on tweets language, contents and inferred geolocations.

  • The U.S. Right-leaning community (Mike Pence, OANN, top Republicans as most retweeted accounts) is interesting. In terms of tweet contents and proximity to other communities, the right-leaning community lies close to and is influenced or overlapped in part with the anti-vaccine misinformation and conspiracy community, as well as the far right group, with relatively sparse edges to the global mainstream informational community. The proportions of unreliable, conspiracy and reliable URLs in their tweets are roughly equal. The top news domains in their tweets include relatively credible though with right bias news sources like foxnews, but also conspiracy sources like zero-hedge and truepundit that have been reported to publish COVID-19 conspiracies (NewsGuard). Inspecting top retweeted and random sample of tweets suggests a mixture of vaccine news updates, politically biased views and also conspiracies (e.g. numerous anti-China messages, Bill Gates conspiracies), and largely anti-vaccine/protocol stance (including misinformation about the vaccines).

  • The misinformation and information communities are strongly separated with sparse RT edges between them (as seen from the separation in the RT graph visualization). The dense communities have high proximity between different anti-vaccine misinformation communities, and between pro-vaccine/news communities.

In additional details of the table, note that unreliable, conspiracy, reliable news source URLs proportions are included. For tweets not containing no URLs or unidentified URL domains, the fraction of such tweets is listed under Others column in the Table. The inferred political leanings that are for majority of the accounts in a community are highlighted in bold or color. The asterisk is used for communities with more than a single prominent inferred leaning type. Languages and geolocations smaller than 10% in the tweets are not reported. Top News Domains and Top Accounts are presented examples from the top-20 in each community, and left unmentioned if the top-20 contained only individuals that were not news or health agencies or well known figures, example in the U.S. Far-right conspiracy community (8).

4.2 Feature distributions

The feature distributions of accounts in the most significant communities (circled in Fig 5) are compared in Fig. 6 and Fig. 7. In Fig. 6, we compare Followings (accounts followed by accounts in the community), Followers, Listed (accounts mentioned in topical lists created by other accounts on Twitter), Favourites (number of tweets liked/favourited by the accounts in the community), Tweets (which is the total tweets posted by the account in its lifetime). These features are available in the account metadata obtained using the Twitter API, and we use account’s statistics at its last observed tweet in the dataset. The main insights about account characteristics in each community are,

  • Followers and Followings distribution inter-quartile range is significantly higher for the Far-right conspiracy community (Fig. 

    6). The distribution across other groups is similar, with anti-vaccine community having slightly lower upper quartile (more fine-grained plots are in the Appx. Fig 11). This suggests more networked accounts in the Far-right conspiracy group, which likely actively follow each other to promote the conspiracy.

  • More accounts appear in Twitter Lists (Listed) from Left, Mainstream, Right, and Far-right communities, compared to Anti-vaccine misinformation and Spanish-En conspiracy community. Yet, the anti-vaccine community do have many Listed accounts, suggesting that such content does have a sizeable audience.

(a) Anti-vax misinformation/ conspiracy
(b) Far-right leaning conspiracies
(c) Spanish-En anti-vax conspiracies
(d) Left-Leaning
(e) Mainstream News
(f) Right-leaning
Figure 6: Account statistics boxplot for information and misinformation / conspiracy communities.
Figure 7: Account statistics for communities (a) account age, (b) number of collected vaccine tweets (original, retweet, reply) by account, and Engagements (account retweeted, replied, mentioned by others) in collected vaccine tweets.

In Fig 6 we also have the Tweets (total tweets posted by the account in its lifetime). To better understand the accounts activities, in Fig 7, we additionally compare the Account Age (Days between first observed tweet in the dataset and the account creation date), Vax Tweets (tweets posted by the account specific to vaccine related content, quantified by observed tweets of the account in the collected dataset), Vax Tweet Engagements (number of vaccine tweets that reference i.e. mention, reply, retweet, quote the account in the community or any of its vaccine tweets, quantified by observed tweets in the collected dataset). The findings are as follows,

  • Total Tweets distribution is lowest for Anti-vaccine misinformation and Spanish-En conspiracy communities.

  • Account Age distribution is the smallest for these two communities, i.e., more recently created accounts. Account Age for Left and Mainstream are highest (older accounts), and Right and Far-right are in between.

  • In the Vax Tweets, however, the Anti-vaccine community is the most vocal with higher Vax Tweet distribution than other communities; Far-right and Right being the least.

  • Interestingly, Anti-vaccine misinformation, Far-right conspiracy, and Spanish-En conspiracy communities have the largest inter-quartile ranges compared to the other communities on Vax Tweets Engagements. That means most accounts in these communities receive non-negligible engagements (mentions from other accounts, or replies, retweets, quotes of its vaccine tweets), in contrast with the other communities. It suggests that discussions are lead and supported in a more decentralized manner in these communities.

4.3 Vaccine uptake distribution in U.S. states

We analyze the geographical and political associations of the accounts and tweets in relation to the vaccine uptake rates in each U.S. state. The public records of how many people have been vaccinated in each state available through CDC is curated and maintained by the research community (mathieu2021global) (accessed June 6, 2021).

Anti-vaccine misinformation and conspiracies have significant consequences associated with reduced vaccination intentions, as determined in prior social psychology research (jolley2014effects). In Fig. 8, we compare the ratio of misinformation (unreliable/conspiracy URLs) to reliable URLs observed in collected tweets for accounts with extracted geolocation information available from the tweet or account metadata (extracted using (dredze2013carmen)).


The vaccine uptake i.e., percentage of vaccinated individuals as of June 6, 2021 per U.S. state is plotted against the rate of misinformation (ratio of unreliable/conspiracy URL tweets to reliable URL tweets) in Fig 8. The states political affiliation is designated based on the 2020 Election votes (Red states voted for Donald Trump and Blue States for Joe Biden in the 2020 Presidential Election). The analysis is state-wise, therefore, only account tweets with valid extracted geolocations of US States are utilized in the plot. The estimated correlation results are as follows,

  • The Pearson’s correlation coefficient is -0.731 between vaccine uptake and misinformation rate. This confirms a high negative correlation of % individuals vaccinated and the rate of online misinformation across states.

  • The vaccination uptake is consistently lower in Republican (red or right-leaning) states, and some swing states e.g. Nevada, Arizona; while the rate of misinformation is consistently higher. This politically biased segregation of vaccine hesitancy or anti-vaccine sentiments is also evident from the misinformation diffusion community analysis (discussed in the prior subsections).

The increased presence of echo-chambers and online diffusion community effects of misinformation and conspiracies segregated by political diets, is a disconcerting reality of social media diffusion network. More efforts are needed to address this vaccine hesitancy in different social demographics. Towards that end, we investigate contents and narratives of the tweets and of misinformation URL tweets, to understand the ways in which anti-vaccine sentiments are promoted.

(a) Ratio of unreliable/conspiracy to reliable URL tweets with extracted geolocation over U.S. states.
(b) % Vaccinated individuals in each U.S. states vs. ratio of unreliable/conspiracy to reliable URL tweets extracted with geolocation over Red and Blue states in 2020 Elections.
Figure 8: Unreliable/conspiracy URLs and Reliable URLs distribution based on extracted geolocation from tweets.

5 COVID-19 Vaccine Hesitancy and Misinformation Narratives

Social media communication studies have outlined factors that can mislead the public and increase vaccine hesitancy (lewandowsky2021covid). The five techniques of science denial are provided under the acronym FLICC (fake experts, logical fallacies, impossible expectations, cherry-picking, and conspiracy theories). In this section, we look at different types of narratives and contexts through which vaccine hesitancy on social media can be increased.

5.1 Social media discussions of post vaccination side-effects

Besides misinformation and conspiracy narratives that can decline vaccination intentions (jolley2014effects), the biased discussion of side-effects or adverse effects post vaccination can again promote hesitancy. This broadly falls under cherry-picking, one of the five science denial techniques. Here, we study whether the discussion of vaccine side-effects on social media differs from the CDC VAERS recorded side-effects obtained from healthcare providers and public reports, and if the differences might have potential to increase COVID-19 vaccine hesitancy.

Using the CDC VAERS records accessed June 10, 2021, we hope to answer the following questions,

  1. Are the side-effects widely discussed in vaccine related tweets also common in VAERS reports?

  2. Is the distribution of side-effects discussed in misinformation URLs tweets distinct from all tweets?

To answer the above questions, we explore the correlation between frequency of the discussed side-effects on social media and their frequency in the VAERS records. To measure the frequency on Twitter, we first extract the medical concepts from the tweets via text matching based on a large medical concept corpus: AskAPatient (limsopatham2016normalising). This corpus provides us with common medical concepts on social media in different forms, such as abbreviations, complete names and even misspelling versions. We use the number of tweets in which a medical concept appears to represent its frequency on Twitter. To measure the frequency in VAERS records, we conduct medical concept extraction in the same way and count the number of individuals whose medical records mention a concept.


In Fig. 9, we plot the medical concept frequencies in all collected tweets (Fig. 8(a)) and in unreliable/conspiracy URLs tweets (Fig. 8(b)) against corresponding frequencies in VAERS records. From the visualization, we can see that the concepts that are widely discussed on social media are the ones that are rarer in the medical records. More frequent effects like pain, fever, headache are relatively rarely discussed on social media. This trend is amplified on misinformation tweets, where the rarer reported concepts such as “paralysis", “allergic reaction", “malignant neoplastic disease" (cancer) are more frequently referenced.

The negative correlation between discussed concepts and VAERS reports increases the potential for vaccine hesitancy in social media users, since rare symptoms get more attention due to its novelty in all tweets or due to cherry-picking in unreliable/conspiracy tweets. The Pearson correlation coefficient is more negative for unreliable/conspiracy URL tweets at -0.36 compared to -0.25 on all tweets.

(a) All tweets. (Pearson’s coefficient of -0.250)
(b) Unreliable/conspiracy URLs tweets (Pearson -0.358)
Figure 9: Frequency correlation of side-effects discussed on Twitter compared with that recorded in VAERS.

5.2 Topic modeling of misinformation tweets

We use topic modeling on unreliable/conspiracy URLs tweets to identify narratives used to promote COVID-19 vaccine misinformation. The text is pre-processed by tokenization, punctuation removal, stop-word removal, and removal of URLs, hashtags, mentions, and special characters, and represented using pre-trained fastText word embeddings (bojanowski-etal-2017-enriching) 444Pre-trained embeddings can be downloaded from fasttext. The average of word embeddings in the tweet text is used to represent each tweet. Pre-trained embeddings trained on large English corpora encode more semantic information useful for short texts where word co-occurrence statistics are limited for utilizing traditional probabilistic topic models (li2016topic)

. The text representations are clustered (k-means) to identify topics. Number of clusters is selected using silhouette and Davies-Bouldin measures of cluster separability between 3-35. Inspecting the word distribution and top representative tweets in topic clusters, we discarded non-English clusters and merged over-partitioned ones.

Cluster Index Top (tf-idf) words Representative Tweets (nearest distance to centroid)
1 Scientific facts mrna, pfizer, cells, moderna, human, via, system, new, operating, evidence, experimental, pathogenic, trial, priming, dna, virus, alarming, adults, analysis, study, hiv, correlation, older, fetal, flu “@naomirwolf @naomirwolf Zuckerberg refuses Covid vaccine due to real possibility of permenent DNA alteration. Zuckerbefg:I Share Some Caution on this [Vaccine] Because We Jus Don’t Know the Long-Term Side Effects of Basically Modifying People’s DNA and RNA"
“#Nuremberg #CrimesAgainstHumanity #Plandemic #CovidHOAX #GreatReset #Event201 #Agenda2030 #NoMasks #NoForcedVaccines #Pharmageddon Mainstream science admits COVID-19 vaccines contain mRNA “nanoparticles” that trigger severe allergic reactions"
2 Side Effects allergic, reaction, pfizer, fda, severe, adverse, serious, worker, health, healthcare, hospitalized, effects, moderna, news, workers, side, threatening, intubated, rate, doctor, people, boston, alaska, facial, higher, shot, suffering, paralysis, uk “Cardiothoracic surgeon warns FDA, Pfizer on immunological danger of COVID vaccines in recently convalescent and asymptomatic carriers | Opinion | LifeSite"
“SAFE AND EFFECTIVE! IS IT? TIME TO FACE THE TRUTH! FDA Investigates Allergic Reactions to Pfizer COVID Vaccine After More Healthcare Workers Hospitalized • Children’s Health Defense"
3 Effectiveness news, pfizer, dr, says, world, via, fauci, bill, gates, kennedy, new, jr, robert, moderna, take, people, get, doctors, uk, coronavirus, rollout, desantis, video, taking, trial, pharma, african, us, medical, lifesite, ron, tests, gov, warns, big “Can this this be? Then Its Not a Vaccine: Crazy Dr. Fauci Said in October Early COVID Vaccines Will Only Prevent Symptoms and NOT Block the Infection What? #FREESPEECH #WALKAWAY #DEPLORABLE #DrainTheSwamp #FakeNews #Trump2020 #Israel #StopTheSteal"
“This also happened with many vaccines, especially the polio. Hundreds of Israelis get infected with Covid-19 after receiving Pfizer/BioNTech vaccine – reports — RT World News"
4 Deaths pfizer, dies, receiving, home, nurse, nursing, days, getting, health, doctor, portuguese, worker, die, shot, taking, residents, first, old, two, miami, died, weeks, healthy, elderly “46 Nursing Home Residents in Spain Die Within 1 Month of Getting Pfizer COVID Vaccine! @ScottMorrisonMP @GregHuntMP Are you still proceeding with the #Pfizer #vaccine rollout which could kill older people? #BREAKING #BreakingNews #auspol #COVID19"
5 Vaccine Refusal workers, pfizer, health, refuse, emergency, coronavirus, people, care, refusing, get, getting, use, hundreds, take, fda, staff, us, uk, room, says, passports, receiving, hospital, news, line, healthcare, one, report, doses “As many as 60% of healthcare workers are refusing to get the #Covid vaccine. There’s an overwhelming lack of trust. Why? #COVID19 #vaccine #CovidVaccine"
“Start a DEMONSTRATION AND BOYCOTT CAMPAIGN against the VACCINE TYRANTS! COVID vaccines disaster of Adverse reports to CDC….look it up! NYC Waitress Fired For Refusing COVID Vaccine Over Fertility Concerns"
6 Rollout trump, biden, joe, uns, fetus, praises, gavi, playing, owns, million, funding, male, funded, gave, gates, stopped, billion, us, plan, admin, distribution, rollout, administration, google, jill, president, speed, doses, warp “Joe Biden Struggles to Read Teleprompter as He Trashes Trump Administration’s Covid-19 Vaccine Distribution Efforts @gatewaypundit"
“We all knew all along that Trump would botch the initial vaccine rollout, and that vaccinations wouldn’t properly get underway until Biden takes office. Just one more thing for Trump to screw up on his way down."
7 Dehumanization takedowntheccp, yanlimeng, wipe, drlimengyan, weapon, bio, white, ccpvirus, war, made, plan, part, ccp, world “@Newsweek World War V5.0: Covid virus is a bio weapon made by CCP. Vaccine is a part of plan to wipe out all white people. @DrLiMengYAN1 #DrLiMengYan1 #YanLiMeng #CCPVirus #Covid19 #TakeDownTheCCP #COVID19"
Table 2: Misinformation topic clusters along with representative tweets and word distribution with highest tf-idf scores.


The misinformation largely targeted seven forms of information manipulation about the COVID-19 vaccines, namely, manipulation of Scientific facts about the COVID-19 vaccines, misleading information about Side Effects, Deaths, Vaccine Effectiveness, and Vaccine Refusal, along with misinformation related to Vaccine Rollout, and Dehumanization/Depopulation/GReat Reset/Bill Gates/Pharmageddon conspiracies. Table. 2 provides examples of top representative tweets and word distributions for each identified topic. Examination of the tweets suggests presence of the five techiques of manipulation FLICC (lewandowsky2021covid) mentioned earlier. In part for vaccine safety and effectiveness, out-wright false claims about scientific facts, and side-effects existed, but also true reported side-effects were discussed with negative strong anti-vaccine sentiment, or missing or misleading contexts. There were also cases of setting up unreliable and misleading expectations about vaccine effectiveness, by suggesting that since vaccines cannot prevent the infection, then its ineffective or not useful, as seen in the Table under topic cluster on Effectiveness.

5.3 Frequent news sources categorization

Fig. 10 presents categorization of the top news domains in unreliable/conspiracy URLs tweets. The categorization is based on the Media Bias/Fact Check ratings of (i) degree of factual reporting (ii) political bias, and (iii) scientific reporting measures. The top domains contain sources promoting both extreme pseudoscience and conspiracy (e.g., Left/Right political propaganda (e.g.,, The factual reporting level from these sources is regarded as either Low, very Low, or Mixed on Media Bias/Fact Check.

Figure 10: Misinformation publishing pseudoscience and propaganda news sources with volume of tweets shared.

6 Related Work

Vaccine hesitancy and misinformation on social media and e-commerce platforms has gained much attention in the past few years (cossard2020falling, juneja2021auditing). cossard2020falling studied the Italian vaccine debate finding echo chambers of anti-vaccine and pro-vaccine groups in 2016 on Twitter, with interaction between the communities being asymmetrical, as vaccine advocates ignore the skeptics. miyazaki2021strategy recently characterized reply behaviour of anti-vaccine accounts in the COVID-19 vaccine discussion on Twitter, finding that anti-vaccine accounts reply most to neutral accounts using toxic and emotional content. The works have not considered misinformation or conspiracies promoted by anti-vaccine communities, in comparison with our work.

Misinformation and coordinated campaigns during COVID-19 have been highly prevalent throughout the pandemic (sharma2020covid, memon2020characterizing, sharma2020identifying, jamison2020not). This has led to increasing concerns around COVID-19 vaccine misinformation. lewandowsky2021covid developed a communication handbook to protect against vaccine misinformation, and CDC and other fact-checking resources have enumerated vaccine myths and facts on their websites. Several studies in social science have conducted surveys to identify how vaccine misinformation decreases pro-vaccine intents (enders2020different, jolley2014effects, singh2020first). pierri2021impact found misinformation URLs shared on Twitter are correlated with vaccine hesitancy rates taken from survey data and vaccinations in the U.S., and effect of misinformation on hesitancy is stronger in U.S. Democratic counties, although hesitancy is higher in Republican counties.

Dataset of English Tweets related to COVID-19 vaccines deverna2021covaxxy and longitudinal dataset of accounts promoting anti-vaccine hashtags sampled from Twitter discussions muric2021covid have been curated recently by the research community. Lastly, another line of work investigates effects of platform actions on vaccine misinformation. sharevski2021misinformation examined Twitter’s soft moderation efforts against misinformation i.e., warning labels and covers on Tweets with unreliable information, and found that warning covers work but not labels in reducing perceived accuracy of the content, through a randomized survey participant study. kim2020effects looked at YouTube’s information interventions on likely misinformation videos, and observed reduced traffic or viewership on affected videos.

7 Discussion and Conclusion

This work examined coordinated campaigns, and other anti-vaccine misinformation and conspiracy communities on Twitter in the context of COVID-19 vaccines discussions. Coordinated efforts might be present to promote a “Great Reset" conspiracy narrative, and the involved accounts have strong anti-vaccine and anti-social messaging (such as no lockdowns, no masks). Furthermore, we investigated other misinformation/conspiracy communities from the diffusion network structure. The presence anti-vaccine and far-right misinformation/conspiracy communities and their influence on right-leaning accounts can further distance right-leaning accounts (that lie in the retweet clusters of prominent Republican accounts such as Mike Pence) from mainstream and informational health news and science. We find that current vaccine uptake in different states is negative correlated with the rate of misinformation, and is lower in right-leaning states and some swing states.

Beyond misinformation and conspiracies, vaccine hesitancy can also be impacted by other narrative distortions. For instance, the correlation between CDC VAERS reports and Twitter discussions of side-effects are not always aligned (rarer side-effects are discussed more frequently in misinformation, as well as all tweets). The novelty of rarer effects, and malicious motives to promote vaccine hesitancy in misinformation tweets, can explain the observations.


Fig. 11 contains the account statistics distribution (box plots) for Followings, Followers, Total Tweets. It shows the distribution of account features in each prominent misinformation/conspiracy and information community.

Figure 11: Account statistics for communities (a) Followings, (b) Followers (c) Total tweets in account lifetime.