Improving Sentiment Analysis over non-English Tweets using Multilingual Transformers and Automatic Translation for Data-Augmentation

10/07/2020
by   Valentin Barriere, et al.
0

Tweets are specific text data when compared to general text. Although sentiment analysis over tweets has become very popular in the last decade for English, it is still difficult to find huge annotated corpora for non-English languages. The recent rise of the transformer models in Natural Language Processing allows to achieve unparalleled performances in many tasks, but these models need a consequent quantity of text to adapt to the tweet domain. We propose the use of a multilingual transformer model, that we pre-train over English tweets and apply data-augmentation using automatic translation to adapt the model to non-English languages. Our experiments in French, Spanish, German and Italian suggest that the proposed technique is an efficient way to improve the results of the transformers over small corpora of tweets in a non-English language.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2020

NUIG-Shubhanker@Dravidian-CodeMix-FIRE2020: Sentiment Analysis of Code-Mixed Dravidian text using XLNet

Social media has penetrated into multilingual societies, however most of...
research
02/19/2021

Multilingual Augmenter: The Model Chooses

Natural Language Processing (NLP) relies heavily on training data. Trans...
research
09/19/2021

Unified and Multilingual Author Profiling for Detecting Haters

This paper presents a unified user profiling framework to identify hate ...
research
05/31/2023

FEED PETs: Further Experimentation and Expansion on the Disambiguation of Potentially Euphemistic Terms

Transformers have been shown to work well for the task of English euphem...
research
09/25/2019

Atalaya at TASS 2019: Data Augmentation and Robust Embeddings for Sentiment Analysis

In this article we describe our participation in TASS 2019, a shared tas...
research
04/07/2022

BERTuit: Understanding Spanish language in Twitter through a native transformer

The appearance of complex attention-based language models such as BERT, ...
research
03/26/2019

A New Approach for Semi-automatic Building and Extending a Multilingual Terminology Thesaurus

This paper describes a new system for semi-automatically building, exten...

Please sign up or login with your details

Forgot password? Click here to reset