Semi-supervised Discovery of Informative Tweets During the Emerging Disasters

10/12/2016
by   Shanshan Zhang, et al.
0

The first objective towards the effective use of microblogging services such as Twitter for situational awareness during the emerging disasters is discovery of the disaster-related postings. Given the wide range of possible disasters, using a pre-selected set of disaster-related keywords for the discovery is suboptimal. An alternative that we focus on in this work is to train a classifier using a small set of labeled postings that are becoming available as a disaster is emerging. Our hypothesis is that utilizing large quantities of historical microblogs could improve the quality of classification, as compared to training a classifier only on the labeled data. We propose to use unlabeled microblogs to cluster words into a limited number of clusters and use the word clusters as features for classification. To evaluate the proposed semi-supervised approach, we used Twitter data from 6 different disasters. Our results indicate that when the number of labeled tweets is 100 or less, the proposed approach is superior to the standard classification based on the bag or words feature representation. Our results also reveal that the choice of the unlabeled corpus, the choice of word clustering algorithm, and the choice of hyperparameters can have a significant impact on the classification accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/02/2018

Graph Based Semi-supervised Learning with Convolution Neural Networks to Classify Crisis Related Tweets

During time-critical situations such as natural disasters, rapid classif...
research
05/02/2022

Reducing the Cost of Training Security Classifier (via Optimized Semi-Supervised Learning)

Background: Most of the existing machine learning models for security ta...
research
08/26/2011

Semi-supervised logistic discrimination via labeled data and unlabeled data from different sampling distributions

This article addresses the problem of classification method based on bot...
research
12/27/2015

Robust Semi-supervised Least Squares Classification by Implicit Constraints

We introduce the implicitly constrained least squares (ICLS) classifier,...
research
12/23/2018

How to Achieve High Classification Accuracy with Just a Few Labels: A Semi-supervised Approach Using Sampled Packets

Network traffic classification, which has numerous applications from sec...
research
11/19/2015

Detection of Slang Words in e-Data using semi-Supervised Learning

The proposed algorithmic approach deals with finding the sense of a word...

Please sign up or login with your details

Forgot password? Click here to reset