NollySenti: Leveraging Transfer Learning and Machine Translation for Nigerian Movie Sentiment Classification

by   Iyanuoluwa Shode, et al.

Africa has over 2000 indigenous languages but they are under-represented in NLP research due to lack of datasets. In recent years, there have been progress in developing labeled corpora for African languages. However, they are often available in a single domain and may not generalize to other domains. In this paper, we focus on the task of sentiment classification for cross domain adaptation. We create a new dataset, NollySenti - based on the Nollywood movie reviews for five languages widely spoken in Nigeria (English, Hausa, Igbo, Nigerian-Pidgin, and Yoruba. We provide an extensive empirical evaluation using classical machine learning methods and pre-trained language models. Leveraging transfer learning, we compare the performance of cross-domain adaptation from Twitter domain, and cross-lingual adaptation from English language. Our evaluation shows that transfer from English in the same target domain leads to more than 5 same language. To further mitigate the domain difference, we leverage machine translation (MT) from English to other Nigerian languages, which leads to a further improvement of 7 low-resource languages are often of low quality, through human evaluation, we show that most of the translated sentences preserve the sentiment of the original English reviews.


page 1

page 2

page 3

page 4


Bridging the Domain Gap for Stance Detection for the Zulu language

Misinformation has become a major concern in recent last years given its...

Large-Scale Hate Speech Detection with Cross-Domain Transfer

The performance of hate speech detection models relies on the datasets o...

Text Length Adaptation in Sentiment Classification

Can a text classifier generalize well for datasets where the text length...

The Impact of Indirect Machine Translation on Sentiment Classification

Sentiment classification has been crucial for many natural language proc...

Bi-Text Alignment of Movie Subtitles for Spoken English-Arabic Statistical Machine Translation

We describe efforts towards getting better resources for English-Arabic ...

Multilingual Event Extraction from Historical Newspaper Adverts

NLP methods can aid historians in analyzing textual materials in greater...

Please sign up or login with your details

Forgot password? Click here to reset