Benchmarking Multi-Task Learning for Sentiment Analysis and Offensive Language Identification in Under-Resourced Dravidian Languages

by   Adeep Hande, et al.

To obtain extensive annotated data for under-resourced languages is challenging, so in this research, we have investigated whether it is beneficial to train models using multi-task learning. Sentiment analysis and offensive language identification share similar discourse properties. The selection of these tasks is motivated by the lack of large labelled data for user-generated code-mixed datasets. This paper works on code-mixed YouTube comments for Tamil, Malayalam, and Kannada languages. Our framework is applicable to other sequence classification problems irrespective of the size of the datasets. Experiments show that our multi-task learning model can achieve high results compared with single-task learning while reducing the time and space constraints required to train the models on individual tasks. Analysis of fine-tuned models indicates the preference of multi-task learning over single-task learning resulting in a higher weighted F1-score on all three languages. We apply two multi-task learning approaches to three Dravidian languages: Kannada, Malayalam, and Tamil. Maximum scores on Kannada and Malayalam were achieved by mBERT subjected to cross-entropy loss and with an approach of hard parameter sharing. Best scores on Tamil was achieved by DistilBERT subjected to cross-entropy loss with soft parameter sharing as the architecture type. For the tasks of sentiment analysis and offensive language identification, the best-performing model scored a weighted F1-score of (66.8\% and 90.5\%), (59\% and 70\%), and (62.1\% and 75.3\%) for Kannada, Malayalam, and Tamil on sentiment analysis and offensive language identification, respectively. The data and approaches discussed in this paper are published in Github\footnote{\href{}{Dravidian-MTL-Benchmarking}}.


page 21

page 22

page 23


UBC-DLNLP at SemEval-2023 Task 12: Impact of Transfer Learning on African Sentiment Analysis

We describe our contribution to the SemEVAl 2023 AfriSenti-SemEval share...

Polarity and Intensity: the Two Aspects of Sentiment Analysis

Current multimodal sentiment analysis frames sentiment score prediction ...

SemEval-2023 Task 12: Sentiment Analysis for African Languages (AfriSenti-SemEval)

We present the first Africentric SemEval Shared task, Sentiment Analysis...

A Novel Multi-Task Learning Approach for Context-Sensitive Compound Type Identification in Sanskrit

The phenomenon of compounding is ubiquitous in Sanskrit. It serves for a...

GMNLP at SemEval-2023 Task 12: Sentiment Analysis with Phylogeny-Based Adapters

This report describes GMU's sentiment analysis system for the SemEval-20...

Performance Evaluation of Deep Transfer Learning on Multiclass Identification of Common Weed Species in Cotton Production Systems

Precision weed management offers a promising solution for sustainable cr...

Multi-task Learning of Negation and Speculation for Targeted Sentiment Classification

The majority of work in targeted sentiment analysis has concentrated on ...

Please sign up or login with your details

Forgot password? Click here to reset