SentiPers: A Sentiment Analysis Corpus for Persian

01/23/2018
by   Pedram Hosseini, et al.
0

Sentiment Analysis (SA) is a major field of study in natural language processing, computational linguistics and information retrieval. Interest in SA has been constantly growing in both academia and industry over the recent years. Moreover, there is an increasing need for generating appropriate resources and datasets in particular for low resource languages including Persian. These datasets play an important role in designing and developing appropriate opinion mining platforms using supervised, semi-supervised or unsupervised methods. In this paper, we outline the entire process of developing a manually annotated sentiment corpus, SentiPers, which covers formal and informal written contemporary Persian. To the best of our knowledge, SentiPers is a unique sentiment corpus with such a rich annotation in three different levels including document-level, sentence-level, and entity/aspect-level for Persian. The corpus contains more than 26000 sentences of users opinions from digital product domain and benefits from special characteristics such as quantifying the positiveness or negativity of an opinion through assigning a number within a specific range to any given sentence. Furthermore, we present statistics on various components of our corpus as well as studying the inter-annotator agreement among the annotators. Finally, some of the challenges that we faced during the annotation process will be discussed as well.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/28/2022

Sentiment analysis and opinion mining on E-commerce site

Sentiment analysis or opinion mining help to illustrate the phrase NLP (...
research
02/08/2023

Sentiment analysis and opinion mining on educational data: A survey

Sentiment analysis AKA opinion mining is one of the most widely used NLP...
research
04/21/2019

UniSent: Universal Adaptable Sentiment Lexica for 1000+ Languages

In this paper, we introduce UniSent a universal sentiment lexica for 100...
research
09/18/2015

Building a Pilot Software Quality-in-Use Benchmark Dataset

Prepared domain specific datasets plays an important role to supervised ...
research
04/26/2020

PTPARL-D: Annotated Corpus of 44 years of Portuguese Parliament debates

In a representative democracy, some decide in the name of the rest, and ...
research
06/20/2016

MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos

People are sharing their opinions, stories and reviews through online vi...
research
12/24/2017

Building a Sentiment Corpus of Tweets in Brazilian Portuguese

The large amount of data available in social media, forums and websites ...

Please sign up or login with your details

Forgot password? Click here to reset