Mega-COV: A Billion-Scale Dataset of 65 Languages For COVID-19

05/02/2020
by   Muhammad Abdul-Mageed, et al.
0

We describe Mega-COV, a billion-scale dataset from Twitter for studying COVID-19. The dataset is diverse (covers 234 countries), longitudinal (goes as back as 2007), multilingual (comes in 65 languages), and has a significant number of location-tagged tweets ( 32M tweets). We release tweet IDs from the dataset, hoping it will be useful for studying various phenomena related to the ongoing pandemic and accelerating viable solutions to associated problems.

READ FULL TEXT
research
04/13/2020

ArCOV-19: The First Arabic COVID-19 Twitter Dataset with Propagation Networks

In this paper, we present ArCOV-19, an Arabic COVID-19 Twitter dataset t...
research
04/09/2020

Large Arabic Twitter Dataset on COVID-19

The 2019 coronavirus disease (COVID-19), emerged late December 2019 in C...
research
05/21/2021

Have you tried Neural Topic Models? Comparative Analysis of Neural and Non-Neural Topic Models with Application to COVID-19 Twitter Data

Topic models are widely used in studying social phenomena. We conduct a ...
research
05/23/2018

Grounding the Semantics of Part-of-Day Nouns Worldwide using Twitter

The usage of part-of-day nouns, such as 'night', and their time-specific...
research
05/17/2020

Content analysis of Persian/Farsi Tweets during COVID-19 pandemic in Iran using NLP

Iran, along with China, South Korea, and Italy was among the countries t...
research
04/01/2021

Two Truths and a Lie: Exploring Soft Moderation of COVID-19 Misinformation with Amazon Alexa

In this paper, we analyzed the perceived accuracy of COVID-19 vaccine Tw...
research
04/20/2021

Measuring Shifts in Attitudes Towards COVID-19 Measures in Belgium Using Multilingual BERT

We classify seven months' worth of Belgian COVID-related Tweets using mu...

Please sign up or login with your details

Forgot password? Click here to reset