The ParlaSent-BCS dataset of sentiment-annotated parliamentary debates from Bosnia-Herzegovina, Croatia, and Serbia

06/02/2022
by   Michal Mochtak, et al.
0

Expression of sentiment in parliamentary debates is deemed to be significantly different from that on social media or in product reviews. This paper adds to an emerging body of research on parliamentary debates with a dataset of sentences annotated for detection sentiment polarity in political discourse. We sample the sentences for annotation from the proceedings of three Southeast European parliaments: Croatia, Bosnia-Herzegovina, and Serbia. A six-level schema is applied to the data with the aim of training a classification model for the detection of sentiment in parliamentary proceedings. Krippendorff's alpha measuring the inter-annotator agreement ranges from 0.6 for the six-level annotation schema to 0.75 for the three-level schema and 0.83 for the two-level schema. Our initial experiments on the dataset show that transformer models perform significantly better than those using a simpler architecture. Furthermore, regardless of the similarity of the three languages, we observe differences in performance across different languages. Performing parliament-specific training and evaluation shows that the main reason for the differing performance between parliaments seems to be the different complexity of the automatic classification task, which is not observable in annotator performance. Language distance does not seem to play any role neither in annotator nor in automatic classification performance. We release the dataset and the best-performing model under permissive licences.

READ FULL TEXT

page 5

page 6

research
05/14/2023

Croatian Film Review Dataset (Cro-FiReDa): A Sentiment Annotated Dataset of Film Reviews

This paper introduces Cro-FiReDa, a sentiment-annotated dataset for Croa...
research
09/21/2018

Towards Automated Factchecking: Developing an Annotation Schema and Benchmark for Consistent Automated Claim Detection

In an effort to assist factcheckers in the process of factchecking, we t...
research
08/02/2018

Cyberbullying Detection -- Technical Report 2/2018, Department of Computer Science AGH, University of Science and Technology

The research described in this paper concerns automatic cyberbullying de...
research
09/18/2023

The ParlaSent multilingual training dataset for sentiment identification in parliamentary proceedings

Sentiments inherently drive politics. How we receive and process informa...
research
06/05/2019

The FRENK Datasets of Socially Unacceptable Discourse in Slovene and English

In this paper we present datasets of Facebook comment threads to mainstr...
research
03/16/2022

Morphological Reinflection with Multiple Arguments: An Extended Annotation schema and a Georgian Case Study

In recent years, a flurry of morphological datasets had emerged, most no...
research
07/28/2020

DSC IIT-ISM at SemEval-2020 Task 8: Bi-Fusion Techniques for Deep Meme Emotion Analysis

Memes have become an ubiquitous social media entity and the processing a...

Please sign up or login with your details

Forgot password? Click here to reset