The ParlaSent multilingual training dataset for sentiment identification in parliamentary proceedings

09/18/2023
by   Michal Mochtak, et al.
0

Sentiments inherently drive politics. How we receive and process information plays an essential role in political decision-making, shaping our judgment with strategic consequences both on the level of legislators and the masses. If sentiment plays such an important role in politics, how can we study and measure it systematically? The paper presents a new dataset of sentiment-annotated sentences, which are used in a series of experiments focused on training a robust sentiment classifier for parliamentary proceedings. The paper also introduces the first domain-specific LLM for political science applications additionally pre-trained on 1.72 billion domain-specific words from proceedings of 27 European parliaments. We present experiments demonstrating how the additional pre-training of LLM on parliamentary data can significantly improve the model downstream performance on the domain-specific tasks, in our case, sentiment detection in parliamentary proceedings. We further show that multilingual models perform very well on unseen languages and that additional data from other languages significantly improves the target parliament's results. The paper makes an important contribution to multiple domains of social sciences and bridges them with computer science and computational linguistics. Lastly, it sets up a more robust approach to sentiment analysis of political texts in general, which allows scholars to study political sentiment from a comparative perspective using standardized tools and techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2022

A Spanish dataset for Targeted Sentiment Analysis of political headlines

Subjective texts have been studied by several works as they can induce c...
research
06/02/2022

The ParlaSent-BCS dataset of sentiment-annotated parliamentary debates from Bosnia-Herzegovina, Croatia, and Serbia

Expression of sentiment in parliamentary debates is deemed to be signifi...
research
07/03/2023

ALBERTI, a Multilingual Domain Specific Language Model for Poetry Analysis

The computational analysis of poetry is limited by the scarcity of tools...
research
09/18/2015

Building a Pilot Software Quality-in-Use Benchmark Dataset

Prepared domain specific datasets plays an important role to supervised ...
research
12/16/2020

Building domain specific lexicon based on TikTok comment dataset

In the sentiment analysis task, predicting the sentiment tendency of a s...
research
06/09/2016

Inducing Domain-Specific Sentiment Lexicons from Unlabeled Corpora

A word's sentiment depends on the domain in which it is used. Computatio...
research
08/14/2023

Votemandering: Strategies and Fairness in Political Redistricting

Gerrymandering, the deliberate manipulation of electoral district bounda...

Please sign up or login with your details

Forgot password? Click here to reset