Active learning in annotating micro-blogs dealing with e-reputation

06/16/2017
by   Jean-Valère Cossu, et al.
0

Elections unleash strong political views on Twitter, but what do people really think about politics? Opinion and trend mining on micro blogs dealing with politics has recently attracted researchers in several fields including Information Retrieval and Machine Learning (ML). Since the performance of ML and Natural Language Processing (NLP) approaches are limited by the amount and quality of data available, one promising alternative for some tasks is the automatic propagation of expert annotations. This paper intends to develop a so-called active learning process for automatically annotating French language tweets that deal with the image (i.e., representation, web reputation) of politicians. Our main focus is on the methodology followed to build an original annotated dataset expressing opinion from two French politicians over time. We therefore review state of the art NLP-based ML algorithms to automatically annotate tweets using a manual initiation step as bootstrap. This paper focuses on key issues about active learning while building a large annotated data set from noise. This will be introduced by human annotators, abundance of data and the label distribution across data and entities. In turn, we show that Twitter characteristics such as the author's name or hashtags can be considered as the bearing point to not only improve automatic systems for Opinion Mining (OM) and Topic Classification but also to reduce noise in human annotations. However, a later thorough analysis shows that reducing noise might induce the loss of crucial information.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/26/2020

Integrating Crowdsourcing and Active Learning for Classification of Work-Life Events from Tweets

Social media, especially Twitter, is being increasingly used for researc...
research
12/01/2022

SOLD: Sinhala Offensive Language Dataset

The widespread of offensive content online, such as hate speech and cybe...
research
05/19/2016

Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages

Microblogging platforms such as Twitter provide active communication cha...
research
10/26/2022

Active Learning Framework to Automate NetworkTraffic Classification

Recent network traffic classification methods benefitfrom machine learni...
research
09/16/2021

MOFSimplify: Machine Learning Models with Extracted Stability Data of Three Thousand Metal-Organic Frameworks

We report a workflow and the output of a natural language processing (NL...
research
01/17/2022

Improving the quality control of seismic data through active learning

In image denoising problems, the increasing density of available images ...
research
10/02/2017

HUMOR: A Crowd-Annotated Spanish Corpus for Humor Analysis

Computational Humor, as the name implies, studies humor from a computati...

Please sign up or login with your details

Forgot password? Click here to reset