A Biomedically oriented automatically annotated Twitter COVID-19 Dataset

The use of social media data, like Twitter, for biomedical research has been gradually increasing over the years. With the COVID-19 pandemic, researchers have turned to more nontraditional sources of clinical data to characterize the disease in near real-time, study the societal implications of interventions, as well as the sequelae that recovered COVID-19 cases present (Long-COVID). However, manually curated social media datasets are difficult to come by due to the expensive costs of manual annotation and the efforts needed to identify the correct texts. When datasets are available, they are usually very small and their annotations do not generalize well over time or to larger sets of documents. As part of the 2021 Biomedical Linked Annotation Hackathon, we release our dataset of over 120 million automatically annotated tweets for biomedical research purposes. Incorporating best practices, we identify tweets with potentially high clinical relevance. We evaluated our work by comparing several SpaCy-based annotation frameworks against a manually annotated gold-standard dataset. Selecting the best method to use for automatic annotation, we then annotated 120 million tweets and released them publicly for future downstream usage within the biomedical domain.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/27/2022

A Multilingual Dataset of COVID-19 Vaccination Attitudes on Twitter

Vaccine hesitancy is considered as one main cause of the stagnant uptake...
research
07/11/2022

TweetDIS: A Large Twitter Dataset for Natural Disasters Built using Weak Supervision

Social media is often utilized as a lifeline for communication during na...
research
12/14/2019

#MeTooMA: Multi-Aspect Annotations of Tweets Related to the MeToo Movement

In this paper, we present a dataset containing 9,973 tweets related to t...
research
06/04/2021

Annotation Curricula to Implicitly Train Non-Expert Annotators

Annotation studies often require annotators to familiarize themselves wi...
research
10/23/2018

TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets

Publicly available social media archives facilitate research in a variet...
research
10/05/2021

AraCOVID19-SSD: Arabic COVID-19 Sentiment and Sarcasm Detection Dataset

Coronavirus disease (COVID-19) is an infectious respiratory disease that...

Please sign up or login with your details

Forgot password? Click here to reset