Integrating Crowdsourcing and Active Learning for Classification of Work-Life Events from Tweets

03/26/2020
by   Yunpeng Zhao, et al.
0

Social media, especially Twitter, is being increasingly used for research with predictive analytics. In social media studies, natural language processing (NLP) techniques are used in conjunction with expert-based, manual and qualitative analyses. However, social media data are unstructured and must undergo complex manipulation for research use. The manual annotation is the most resource and time-consuming process that multiple expert raters have to reach consensus on every item, but is essential to create gold-standard datasets for training NLP-based machine learning classifiers. To reduce the burden of the manual annotation, yet maintaining its reliability, we devised a crowdsourcing pipeline combined with active learning strategies. We demonstrated its effectiveness through a case study that identifies job loss events from individual tweets. We used Amazon Mechanical Turk platform to recruit annotators from the Internet and designed a number of quality control measures to assure annotation accuracy. We evaluated 4 different active learning strategies (i.e., least confident, entropy, vote entropy, and Kullback-Leibler divergence). The active learning strategies aim at reducing the number of tweets needed to reach a desired performance of automated classification. Results show that crowdsourcing is useful to create high-quality annotations and active learning helps in reducing the number of required tweets, although there was no substantial difference among the strategies tested.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/27/2019

Active Learning for Event Detection in Support of Disaster Analysis Applications

Disaster analysis in social media content is one of the interesting rese...
research
06/16/2017

Active learning in annotating micro-blogs dealing with e-reputation

Elections unleash strong political views on Twitter, but what do people ...
research
07/07/2020

Modeling and Mitigating Human Annotation Errors to Design Efficient Stream Processing Systems with Human-in-the-loop Machine Learning

High-quality human annotations are necessary for creating effective mach...
research
07/16/2019

Modeling Human Annotation Errors to Design Bias-Aware Systems for Social Stream Processing

High-quality human annotations are necessary to create effective machine...
research
01/28/2022

Dominant Set-based Active Learning for Text Classification and its Application to Online Social Media

Recent advances in natural language processing (NLP) in online social me...
research
03/17/2022

Multilingual Detection of Personal Employment Status on Twitter

Detecting disclosures of individuals' employment status on social media ...
research
09/06/2022

Depression Symptoms Modelling from Social Media Text: An Active Learning Approach

A fundamental component of user-level social media language based clinic...

Please sign up or login with your details

Forgot password? Click here to reset