TweetDIS: A Large Twitter Dataset for Natural Disasters Built using Weak Supervision

07/11/2022
by   Ramya Tekumalla, et al.
0

Social media is often utilized as a lifeline for communication during natural disasters. Traditionally, natural disaster tweets are filtered from the Twitter stream using the name of the natural disaster and the filtered tweets are sent for human annotation. The process of human annotation to create labeled sets for machine learning models is laborious, time consuming, at times inaccurate, and more importantly not scalable in terms of size and real-time use. In this work, we curate a silver standard dataset using weak supervision. In order to validate its utility, we train machine learning models on the weakly supervised data to identify three different types of natural disasters i.e earthquakes, hurricanes and floods. Our results demonstrate that models trained on the silver standard dataset achieved performance greater than 90 a manually curated, gold-standard dataset. To enable reproducible research and additional downstream utility, we release the silver standard dataset for the scientific community.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2021

A Biomedically oriented automatically annotated Twitter COVID-19 Dataset

The use of social media data, like Twitter, for biomedical research has ...
research
08/20/2021

Twitter User Representation using Weakly Supervised Graph Embedding

Social media platforms provide convenient means for users to participate...
research
04/07/2021

HumAID: Human-Annotated Disaster Incidents Data from Twitter with Deep Learning Benchmarks

Social networks are widely used for information consumption and dissemin...
research
10/04/2020

Weakly-supervised Fine-grained Event Recognition on Social Media Texts for Disaster Management

People increasingly use social media to report emergencies, seek help or...
research
09/12/2019

Determining the Scale of Impact from Denial-of-Service Attacks in Real Time Using Twitter

Denial of Service (DoS) attacks are common in on-line and mobile service...
research
09/05/2023

Incorporating Dictionaries into a Neural Network Architecture to Extract COVID-19 Medical Concepts From Social Media

We investigate the potential benefit of incorporating dictionary informa...

Please sign up or login with your details

Forgot password? Click here to reset