L3Cube-MahaSent-MD: A Multi-domain Marathi Sentiment Analysis Dataset and Transformer Models

06/24/2023
by   Aabha Pingle, et al.
0

The exploration of sentiment analysis in low-resource languages, such as Marathi, has been limited due to the availability of suitable datasets. In this work, we present L3Cube-MahaSent-MD, a multi-domain Marathi sentiment analysis dataset, with four different domains - movie reviews, general tweets, TV show subtitles, and political tweets. The dataset consists of around 60,000 manually tagged samples covering 3 distinct sentiments - positive, negative, and neutral. We create a sub-dataset for each domain comprising 15k samples. The MahaSent-MD is the first comprehensive multi-domain sentiment analysis dataset within the Indic sentiment landscape. We fine-tune different monolingual and multilingual BERT models on these datasets and report the best accuracy with the MahaBERT model. We also present an extensive in-domain and cross-domain analysis thus highlighting the need for low-resource multi-domain datasets. The data and models are available at https://github.com/l3cube-pune/MarathiNLP .

READ FULL TEXT
research
04/26/2023

HausaNLP at SemEval-2023 Task 12: Leveraging African Low Resource TweetData for Sentiment Analysis

We present the findings of SemEval-2023 Task 12, a shared task on sentim...
research
05/11/2023

BanglaBook: A Large-scale Bangla Dataset for Sentiment Analysis from Book Reviews

The analysis of consumer sentiment, as expressed through reviews, can pr...
research
12/03/2020

Sentiment analysis in Bengali via transfer learning using multi-lingual BERT

Sentiment analysis (SA) in Bengali is challenging due to this Indo-Aryan...
research
11/06/2021

Patent Sentiment Analysis to Highlight Patent Paragraphs

Given a patent document, identifying distinct semantic annotations is an...
research
08/24/2015

Echoes of Persuasion: The Effect of Euphony in Persuasive Communication

While the effect of various lexical, syntactic, semantic and stylistic f...
research
06/09/2023

SentiGOLD: A Large Bangla Gold Standard Multi-Domain Sentiment Analysis Dataset and its Evaluation

This study introduces SentiGOLD, a Bangla multi-domain sentiment analysi...
research
01/18/2022

HashSet – A Dataset For Hashtag Segmentation

Hashtag segmentation is the task of breaking a hashtag into its constitu...

Please sign up or login with your details

Forgot password? Click here to reset