BanglaBook: A Large-scale Bangla Dataset for Sentiment Analysis from Book Reviews

05/11/2023
by   Mohsinul Kabir, et al.
0

The analysis of consumer sentiment, as expressed through reviews, can provide a wealth of insight regarding the quality of a product. While the study of sentiment analysis has been widely explored in many popular languages, relatively less attention has been given to the Bangla language, mostly due to a lack of relevant data and cross-domain adaptability. To address this limitation, we present BanglaBook, a large-scale dataset of Bangla book reviews consisting of 158,065 samples classified into three broad categories: positive, negative, and neutral. We provide a detailed statistical analysis of the dataset and employ a range of machine learning models to establish baselines including SVM, LSTM, and Bangla-BERT. Our findings demonstrate a substantial performance advantage of pre-trained models over models that rely on manually crafted features, emphasizing the necessity for additional training resources in this domain. Additionally, we conduct an in-depth error analysis by examining sentiment unigrams, which may provide insight into common classification errors in under-resourced languages like Bangla. Our codes and data are publicly available at https://github.com/mohsinulkabir14/BanglaBook.

READ FULL TEXT
research
06/24/2023

L3Cube-MahaSent-MD: A Multi-domain Marathi Sentiment Analysis Dataset and Transformer Models

The exploration of sentiment analysis in low-resource languages, such as...
research
04/20/2022

yosm: A new yoruba sentiment corpus for movie reviews

A movie that is thoroughly enjoyed and recommended by an individual migh...
research
11/25/2014

LABR: A Large Scale Arabic Sentiment Analysis Benchmark

We introduce LABR, the largest sentiment analysis dataset to-date for th...
research
10/28/2022

On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis

This paper investigates the effectiveness and implementation of modality...
research
12/03/2020

Sentiment analysis in Bengali via transfer learning using multi-lingual BERT

Sentiment analysis (SA) in Bengali is challenging due to this Indo-Aryan...
research
07/19/2022

Urdu Speech and Text Based Sentiment Analyzer

Discovering what other people think has always been a key aspect of our ...
research
04/05/2017

Learning to Generate Reviews and Discovering Sentiment

We explore the properties of byte-level recurrent language models. When ...

Please sign up or login with your details

Forgot password? Click here to reset