HumAID: Human-Annotated Disaster Incidents Data from Twitter with Deep Learning Benchmarks

04/07/2021
by   Firoj Alam, et al.
7

Social networks are widely used for information consumption and dissemination, especially during time-critical events such as natural disasters. Despite its significantly large volume, social media content is often too noisy for direct use in any application. Therefore, it is important to filter, categorize, and concisely summarize the available content to facilitate effective consumption and decision-making. To address such issues automatic classification systems have been developed using supervised modeling approaches, thanks to the earlier efforts on creating labeled datasets. However, existing datasets are limited in different aspects (e.g., size, contains duplicates) and less suitable to support more advanced and data-hungry deep learning models. In this paper, we present a new large-scale dataset with  77K human-labeled tweets, sampled from a pool of  24 million tweets across 19 disaster events that happened between 2016 and 2019. Moreover, we propose a data collection and sampling pipeline, which is important for social media data sampling for human annotation. We report multiclass classification results using classic and deep learning (fastText and transformer) based models to set the ground for future studies. The dataset and associated resources are publicly available. https://crisisnlp.qcri.org/humaid_dataset.html

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/31/2020

A large-scale Twitter dataset for drug safety applications mined from publicly existing resources

With the increase in popularity of deep learning models for natural lang...
research
04/14/2020

Standardizing and Benchmarking Crisis-related Social Media Datasets for Humanitarian Information Processing

Time-critical analysis of social media streams is important for humanita...
research
02/23/2022

MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset

Misinformation is becoming increasingly prevalent on social media and in...
research
04/29/2020

A Large-Scale Semi-Supervised Dataset for Offensive Language Identification

The use of offensive language is a major problem in social media which h...
research
07/11/2022

TweetDIS: A Large Twitter Dataset for Natural Disasters Built using Weak Supervision

Social media is often utilized as a lifeline for communication during na...
research
04/09/2017

Automatic Image Filtering on Social Networks Using Deep Learning and Perceptual Hashing During Crises

The extensive use of social media platforms, especially during disasters...
research
11/19/2020

Sentiment Classification in Bangla Textual Content: A Comparative Study

Sentiment analysis has been widely used to understand our views on socia...

Please sign up or login with your details

Forgot password? Click here to reset