Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification

03/07/2017
by   Jan Deriu, et al.
0

This paper presents a novel approach for multi-lingual sentiment classification in short texts. This is a challenging task as the amount of training data in languages other than English is very limited. Previously proposed multi-lingual approaches typically require to establish a correspondence to English for which powerful classifiers are already available. In contrast, our method does not require such supervision. We leverage large amounts of weakly-supervised data in various languages to train a multi-layer convolutional network and demonstrate the importance of using pre-training of such networks. We thoroughly evaluate our approach on various multi-lingual datasets, including the recent SemEval-2016 sentiment prediction benchmark (Task 4), where we achieved state-of-the-art performance. We also compare the performance of our model trained individually for each language to a variant trained for all languages at once. We show that the latter model reaches slightly worse - but still acceptable - performance when compared to the single language model, while benefiting from better generalization properties across languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/16/2019

Bridging the domain gap in cross-lingual document classification

The scarcity of labeled training data often prohibits the internationali...
research
02/24/2021

Task-Specific Pre-Training and Cross Lingual Transfer for Code-Switched Data

Using task-specific pre-training and leveraging cross-lingual transfer a...
research
04/03/2018

Multi-lingual neural title generation for e-Commerce browse pages

To provide better access of the inventory to buyers and better search en...
research
06/07/2018

Ermes: Emoji-Powered Representation Learning for Cross-Lingual Sentiment Classification

Most existing sentiment analysis approaches heavily rely on a large amou...
research
06/20/2019

Semi-supervised acoustic model training for five-lingual code-switched ASR

This paper presents recent progress in the acoustic modelling of under-r...
research
04/11/2019

Multi-lingual Dialogue Act Recognition with Deep Learning Methods

This paper deals with multi-lingual dialogue act (DA) recognition. The p...
research
10/11/2020

Detecting Foodborne Illness Complaints in Multiple Languages Using English Annotations Only

Health departments have been deploying text classification systems for t...

Please sign up or login with your details

Forgot password? Click here to reset