Anubhuti – An annotated dataset for emotional analysis of Bengali short stories

10/06/2020
by   Aditya Pal, et al.
0

Thousands of short stories and articles are being written in many different languages all around the world today. Bengali, or Bangla, is the second highest spoken language in India after Hindi and is the national language of the country of Bangladesh. This work reports in detail the creation of Anubhuti – the first and largest text corpus for analyzing emotions expressed by writers of Bengali short stories. We explain the data collection methods, the manual annotation process and the resulting high inter-annotator agreement of the dataset due to the linguistic expertise of the annotators and the clear methodology of labelling followed. We also address some of the challenges faced in the collection of raw data and annotation process of a low resource language like Bengali. We have verified the performance of our dataset with baseline Machine Learning as well as a Deep Learning model for emotion classification and have found that these standard models have a high accuracy and relevant feature selection on Anubhuti. In addition, we also explain how this dataset can be of interest to linguists and data analysts to study the flow of emotions as expressed by writers of Bengali literature.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/09/2019

BHAAV- A Text Corpus for Emotion Analysis from Hindi Stories

In this paper, we introduce the first and largest Hindi text corpus, nam...
research
06/28/2023

Emotion Analysis of Tweets Banning Education in Afghanistan

This paper introduces the first emotion annotated dataset for the Dari v...
research
02/18/2020

Investigating an approach for low resource language dataset creation, curation and classification: Setswana and Sepedi

The recent advances in Natural Language Processing have been a boon for ...
research
03/17/2020

PO-EMO: Conceptualization, Annotation, and Modeling of Aesthetic Emotions in German and English Poetry

Most approaches to emotion analysis regarding social media, literature, ...
research
01/25/2022

The ABBE Corpus: Animate Beings Being Emotional

Emotion detection is an established NLP task of demonstrated utility for...
research
12/14/2019

#MeTooMA: Multi-Aspect Annotations of Tweets Related to the MeToo Movement

In this paper, we present a dataset containing 9,973 tweets related to t...
research
03/09/2023

TGDataset: a Collection of Over One Hundred Thousand Telegram Channels

Telegram is one of the most popular instant messaging apps in today's di...

Please sign up or login with your details

Forgot password? Click here to reset