The development of natural language resources and technologies opens new possibilities for social and political sciences (rheault2016measuring). After decades of analyzing individual political speeches and transcripts, natural language processing (NLP) allows orders of magnitude larger studies. Parliamentary corpora are available for many parliamentary democracies and include draft bills, amendments to bills, adopted legislation, committee reports, and transcripts of floor debates. Processing these heterogeneous records is challenging. However, the recent ParlaMint project has produced unified corpora of parliamentary debates in 17 European parliaments, making them widely accessible (parlamint2022). This broadens the possible scope of analysis from individual countries to joint issues and differences. Using modern monolingual and cross-lingual NLP techniques on these corpora can provide new insight into the language used, expression of speakers, as well as similarities and differences in topics, emotions and sentiment in different parliaments.
National parliaments have developed their own code of behavior and speech. The comparison between them is difficult. We propose a novel technological approach, combining monolingual and cross-lingual prediction models with machine translation to analyze political, sociological, and linguistic phenomena. Recent research has shown that such approaches are possible for analysis of social media but they have not yet been applied to parliamentary speech corpora. We make such an attempt and analyze parliamentary debates from six national parliaments: Bulgarian, Czech, English, French, Slovene, Spanish, and United Kingdom (UK) in the period from 2017 to 2020.
We first analyze and interpret topics discussed in these parliaments and determine the sentiment and emotions appearing in the debates. We exploit the metadata present for each speaker and predict the age, gender, and political wing for each speaker. This allows us to detect the differences in used language between speakers with these characteristics.
In our work, we used modern NLP methods to investigate both similarities and differences among several parliament discourses using multilingual and cross-lingual framework. The main contributions can be summarized as follows:
a methodological framework for a comprehensive comparison of parliamentary speeches;
parallel topic analysis for six European parliaments;
comparison of linguistic effects based on age, gender, and political position of speakers;
comparison of sentiment and emotions extracted with prediction models.
The paper is structured into five further sections. In Section 2, we present background and related work on analyzing parliamentary speech split into EU and national parliaments, age, gender, sentiment and emotions. To make the results understandable, we include a short outline of recent political events in each analyzed country. Section 3 describes our ParlaMint datasets. In Section 4, we present our methodology, split into topic modelling, prediction of metadata, sentiment,and emotions. The results are covered in Section 5, while we draw the conclusions and present ideas for further work in Section 6.
2 Background and Related Work
In this section, we first present previous work on analyzing parliamentary debates using NLP methods. In Section 2.1, we cover European parliament and in Section 2.2, we overview national parliaments. Several aspects we analyzed were also investigated outside the parliamentary context. Sections 2.3, 2.4, 2.5, and 2.6 cover related works referring to age, gender, sentiment, and emotions in the political discourse. To make our results better understandable, we end with a brief background on the political situation in the time covered by our data for the six analyzed countries in Section 2.7.
2.1 European Parliament
The European Parliament (EP) is a rich source of corpora for multilingual parliament discourse analysis (bendazzoli2005approach; hajlaoui2014dcep). Currently, the spoken accounts appear in 24 languages, but the size and quality of written records vary (hollink2017bias). Members of the Parliament (MEPs) are free to speak in any of the official languages. Speeches are sometimes translated into (some) other languages, depending on the prioritization within the EP, specific translation requests of the members, and budget constraints. This makes altogether 24 subcorpora of varying size, where each speech appears in the original and some translated forms.
The first studies of EP were based on statistics such as the word frequency and they were used to study political positions of parties (benoit2003estimating; laver2003extracting; proksch2010position), role of gender (back2014takes), and personal emphases of MPs (baumann2016constituency; pardos2016political). proksch2010position reported a modest language effect in their study of party positions in the EP, which they ascribed to translation rather than actual differences in positions between the three analyzed countries. hollink2017bias show that subcorpora in different languages may lead to different conclusions about the political landscape. We aim to use a uniform multilingual and cross-lingual methodology to gain insights into this issue.
2.2 National Parliaments
Several studies analyze the discourse in national parliaments. We present a short overview of these works.
adapted affective computing methods to study political discourse, developing domain-specific polarity lexicons for the analysis of the UK House of Commons. According to the study, to understand the decisions of elected politicians, one must tap into their emotions. Most other studies on emotion and politics were conducted on individuals outside democratic institutions. Thus, parliamentary debates are emotionally still not well understood.
examines the reliability of detecting latent concepts such as political ideology. They show that neural networks predicting such concepts can be enhanced with metadata from parliamentary corpora. This method was validated on publicly available corpora from UK, Canada, and United States (US). Using the Structural Topic Model (STM) and word-counting tool Wordfish,cho2021language analyzed the testimony statements given by non-government organizations (NGOs) and companies in US congressional hearings for foreign aid and floor speeches of left- and right-leaning legislators in the congressional record. Constituent groups appear to have an impact on the aid positions of individual legislators, which may then affect the aid decisions of the US.
lewandowsky2022new employed a quantitative text analysis with the Wordfish tool, showing that populists in parliament do not necessarily lead to increased polarization regarding specific issues in German Bundestag debates. navarretta2020identifying
analyze the words used in the Danish parliament to determine if speakers of the four parties can be detected using machine learning models.
wei2020analyzing analyze foreign relations based on parliamentary texts. First, topic words are extracted from parliamentary texts, and then a co-word network is constructed to represent the correlation structure of topic words. To detect characteristics and connotations of foreign relations, the authors apply basic statistics, calculate network indicators, detect communities, and visualize network maps.
gennaro2021emotion analyzed emotions and reason in the language of US Congress members by producing a new measure of emotive speech which combines dictionary methods with word embeddings to look at the relative use of affective and cognitive language. Authors analyzed how that measure evolves over time, across individuals, and in response to electoral and media pressures.
In contrast to the studies mentioned above of individual parliaments, we cover six national parliaments and perform a range of different analyses to gain a comprehensive overview of their similarities and differences.
2.3 Language and Age
Age as a factor of language variation is one of the most salient and productive objects of research in the field of sociolinguistics (murphy2010corpus). The description of differences between young and older generations focuses on (in)formality (labov1972sociolinguistic; stenstrom2009youngspeak). In the research of group membership through speech (ghafournia2015language), sociolinguists describe two types of prestige, overt and covert prestige. Overt prestige is related to standard and more formal linguistic features, which are normally associated with those who hold more power and status. Covert prestige, on the other hand, is the non-standard variety employed in a scenario that encourages cooperation, communality, communication ease, and engagement (trudgill1972sex). Based upon these considerations, some linguistic differences age may explain are adults’ preference for syntactic complexity (frizelle2018growth), swearing (jay2013child), lexical conservativism (kerswill1996children), usage of positive politeness strategies (emara2017gender), teenagers’ tendency towards language change (milroy1985linguistic), the use of slang (rodriguez1994youth) or abruptness (de2012youth).
Several authors covered the prediction of age and other personal traits such as gender or political affiliation, e.g., (dahllof2012automatic)
, who analyzed the wording of political speeches in Swedish. The results show that it is possible to classify politicians according to their age, ideology, and gender to some degree. We analyze six parliaments at once, which opens a broader perspective and gives more general conclusions.
2.4 Language and Gender
Since 1922, a number of studies have addressed the role of gender in language expression – for an overview, see (tenorio2016genderlect). For example, the debate ranges on whether gender is a social construct, whether there exist different genderlects with different characteristics, and whether a so-called “women’s language” is the result of culture or power relations (coates2015women; lakoff1973language). The linguistic features claimed to characterize females range from articulatory phonetics and grammar to pure pragmatics, e.g., the tendency for hypercorrection, conservativism, self-disclosure and attentiveness; abundance of intensifiers and restricted vocabulary associated with domesticity; preference for simple syntax, minimal responses, emotion(al) language, expressive speech acts, diminutives and terms of endearment; usage of rising intonation, questions and epistemic modality to mark their lack of confidence; and, finally, neither swearing nor turn-taking control, interruption or topic selection in conversation.
In our work, we predict the gender of speakers available as metadata. In this way, we establish a level of differences between speeches used by MPs of a different gender. In gender detection, we find some interesting research that successfully applies machine learning and/or sentiment analysis (argamon2003gender; park2019gender; menendez2020damegender; kowsari2020gender). An important consideration in the prediction of speakers’ gender is grammatical gender. In the four of the six languages we cover, Bulgarian, Czech, Slovenian, and Spanish, there are three grammatical genders (masculine, feminine, and neuter); in French there are two genders (masculine and feminine), while English has no grammatical gender. Grammatical gender can generally be inferred from the ending of nouns, adjectives, determiners or past participles. In some cases (not all), this means that the gender of a speaker can be determined. Next, we give some examples of these phenomena for the analyzed languages.
BG: Az sam sigurna (I am sure: feminine), Az sam siguren (I am sure: masculine). In Bulgarian, there are synthetic and analytic tenses/moods. The former contains no indication of gender. Here is an example of a synthetic form: Az kazax (I said - feminine and masculine). When an analytic form is used, the gender is indicated by the past participle: Az bix predlozhila (I would suggest: feminine) vs. Az bih predlozhil (I would suggest: masculine).
CZ: Já bych řekl (I would say: masculine), Já bych řekla (I would say: feminine). Again, if a synthetic form is used, then there is no indication of a specific gender: Já si myslím (I think: feminine and masculine).
ES: Estoy harta del populismo (I’m sick of populism: feminine); Estoy harto del populismo (I’m sick of populism: masculine).
FR: Je suis prête (I am ready: feminine), Je suis prêt (I am ready: masculine). Again, there are many cases where the gender is not revealed, e.g., Comme j’ai dit (As I said: feminine and masculine).
SI: The gender is revealed when using the first person singular in the past and future tense, e.g., Rekla sem (I said: feminine), Rekel sem (I said: masculine). The gender is not revealed in the present tense, e.g., Mislim (I think: both masculine and feminine).
In English, the gender is not a grammatical category but a lexico-sematic feature that can be inferred from the personal and relative pronouns used (the person who arrived; he is nice) and a few morphemes (actor vs actress; policeman vs policewoman); adjectives, determiners or past participles do not show it (this happy man vs this happy woman; she was kissed vs he was kissed). These features are not revealing the speakers’ gender.
2.5 Sentiment in Politics
The problem of computational sentiment analysis for parliament discourses has been tackled extensively but with relatively little cross-country comparison. In most cases, sentiment analysis involves document, sentence, and aspect-level analysis.
dziecikatko2018application and rheault2016measuring apply sentiment analysis to entire corpora at the highest granularity. Their analysis of the Polish and UK parliaments aggregates sentiment scores of all speeches. honkela2014five explore the overall sentiment of EU Parliament transcripts on the dataset level, whereas (sakamoto2017cross) consider the polarity of US and Japanese datasets.
While in NLP sentiment analysis is often fine-grained (such as at the level of speech, speech segment, paragraph, sentence, or phrase), in political science, the unit of analysis is primarily an actor (individual politician whose contributions are pooled together). This is the focus of most works on position scaling, a task very much associated with that field. It appears that this confirms to some extent (hopkins2010method) assertion that while computer scientists are interested in finding the needle in the haystack, social scientists are more interested in characterizing it. The exceptions come from works in the social and political sciences (iliev2019political) and (hopkins2010method) that propose ways to optimize speech-level classification for social science purposes and from computer science (glavavs2017unsupervised), which also consider the position scaling issue.
Sentiment detection has advanced considerably in the last few years with the advent of large pretrained language models such as BERT(devlin-etal-2019-bert). This has allowed applications to social media, stock market predictions, user stance detection in reviews, hate-speech detection, etc. However, parliamentary discourse is hard to analyze for established techniques due to specific formal speech and linguistic differences to existing training datasets (rheault2016measuring). rudkowskysupervised study several machine learning approaches based on word embeddings for Austrian parliamentary speeches. Similarly, abercrombie2020parlvote and elkink2021predicting investigate predicting votes based on the parliament speeches.
In this paper, we follow political sciences and predict the sentiments of speakers based on their speeches. The results are cross-lingual for six different parliaments, which, to our knowledge, has not been done before.
2.6 Emotions in Politics
alba2018emotion states that whatever we say, write, hear, and read is produced and processed through the filter of affect. Cognition and emotion are, therefore, two mutually interconnected systems (barrett2020seven). In this regard, van1985handbook
argues that one of the most distinguishing features of manipulation lies in shaping and framing messages in such a way that they accord with their recipients’ negative emotions, usually deriving from feelings of powerlessness and injustice. In the current political landscape, which is imbued with populism, this idea is of utmost importance, especially at a moment when the emotional is preferred to the intellectual.
Research shows how resentment, anxiety, panic, anger, and disgust can help populist politicians seduce their voters (betz1993new; olson2020love; paschen2019investigating). They may use the same discursive strategies to attack and bring their rivals into disrepute. For instance, they can spread unreliable news about their opponents and other sensationalist information with bombastic but simple expressions; plentiful negative ethical and aesthetic evaluative terms; swear words and colloquialisms; and adversarial vocabulary echoing 20th-century propaganda111https://www.thebritishacademy.ac.uk/blog/how-language-fake-news-echoes-20th-century-propaganda/. Despite the similarities, however, the discourses of right- and left-wing populist leaders are quite diverse. Open opposition to capitalist elites drives left-wing populists to show their hatred of big corporations, financial, and governmental institutions (de1997populism). On the other hand, due to their fear of losing their status because of the alleged privileges granted to minority groups in a multicultural society, right-wing populists cannot conceal their antagonism and hostility towards such communities (salmela2017emotional).
Our analysis is unique in detecting and comparing emotions in six national parliaments at once. This reveals some similarities but also surprising differences.
2.7 Recent Political Outline
Over the past years, the world status quo has been shaken dramatically; this turmoil has been triggered by several events. Terrorism has proved to be a real threat in Europe. Extremist nationalist movements have increased in popularity in many counties, and in the last decade, populists have risen to power. To allow better interpretation and comprehension of the obtained results, we briefly describe the main political outlines of the six countries involved in the covered period.
2.7.1 UK Politics
In 2016, UK held a referendum on its membership of the EU, and 51.9% percent (mainly from England and Wales) decided to leave. This led to a period of insecurity as to the subsequent deteriorating relationship between Europe and UK. The year 2017 starts with a women’s march protesting Donald Trump’s inauguration speech as US 46th president. During those twelve months, the EU Bill is backed by most MPs. A jihadist terrorist killing six people in London and a Manchester bomber causing 19 casualties leads to islamophobia. In 2019, after the disclosure of the internal disagreement about how to proceed with withdrawing from EU, the country was led to elections, and Boris Johnson from the Conservative party became the prime minister. In 2020 and 2021, Covid-related news overflew the media and affected the normal functioning of the Parliament, not to mention the daily life of everybody on the planet.
2.7.2 Spanish Politics
In 2017, Catalonia’s leaders declared the region’s independence from Spain after the so-called Catalan Referendum. Following violent street unrest, some Catalan leaders fled, and others were arrested, prosecuted and sentenced. Other events that shaped Spaniards’ life during the recent period were as follows. Islamic extremists killed many people in Catalonia in 2017. King Felipe’s brother-in-law was put to jail for tax fraud, and his father, King Juan Carlos, had to leave Spain after abdicating and having fallen in disgrace. The Spanish Parliament also talked about vaccination, the lockdown, the increase in domestic violence, and the rise of the far-right and populism.
2.7.3 Bulgarian Politics
The Bulgarian Parliament is unicameral. In 2017. the 44th parliament started to work and was dominated by the right-wing-to-centre pro-European party GERB in a coalition with other supporting parties. The government was involved in severe conflicts with the president, who ran as an independent candidate but was supported by the biggest opposition Bulgarian socialist party. In the summer of 2020, big protests took place against the prime minister from GERB, Boyko Borisov, and the state’s chief attorney Ivan Geshev. They allowed corruption to prevail in both the executive and justice systems.
2.7.4 Czech Politics
The parliament of the Czech Republic consists of two chambers - the Lower House (Chamber of Deputies) and Upper House (Senate.). The ParlaMint-CZ corpus contains the stenographic protocols of the Chamber of Deputies. During the period covered by the data, elections for the Chamber of deputies were held in 2017. Thus, in 2019 and 2020, the leading party was the populist centre/centre-right ANO 2011 in a coalition with the center-left ČSSD (Czech Social Democratic Party). In 2019, there were huge protests against the prime minister from ANO Andrej Babis for alleged fraud.
2.7.5 French Politics
The politics of France takes place within the framework of a semi-presidential system and two houses of parliament, the main National Assembly and the less influential Senate. In 2016, Emmanuel Macron was surprisingly elected the president and his centrist party La République En Marche! (LREM) won the majority in the National Assembly. This disrupted the previous bipartisan (socialists vs republicans) political landscape. The main events that shaped the politics were the terrorist attacks, demonstrations of the yellow vests in 2018, winning the football world cup in 2019, and the terrorist murder of teacher Samuel Paty in 2020.
2.7.6 Slovene Politics
Slovenian parliament has two chambers, where the main legislative chamber Državni zbor was last elected in 2018. The second chamber Državni svet is elected by different interest groups and can veto the legislation, forcing its re-vote with an absolute majority in the first chamber. The central-left coalition government resigned in 2020 due to internal conflicts, and the central-right coalition formed the government, which immediately had to face the COVID-19 outbreak. The conflicting nature of the prime minister Janez Janša and the protests against the epidemics measures shaped the political landscape in 2020 and 2021.
3 The Data
In this section, we describe the datasets used in our analysis. Section 3.1 describes the ParlaMint project, which collected and preprocessed the data, while in Section 3.2 we provide information on the actually used datasets.
3.1 ParlaMint Project Background
ParlaMint222http://www.clarin.eu/parlamint project aims to enhance the development and usage of national parliamentary corpora. The data has been synchronized with respect to the same TEI format and time span. It can be exploited for linguistic, social, and political research in cross-lingual and cross-parliament settings.
We use the multilingual comparable corpora of parliamentary debates ParlaMint 2.1 containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each national corpus containing various amounts of words varying from words (for Hungarian) to words (for UK). The sessions in the corpora are marked as belonging to the COVID-19 period (after November 1st 2019), or being "reference" (before that date). The data is freely available through the CLARIN.SI repository333http://hdl.handle.net/11356/1432.
The corpora contain extensive metadata, including many aspects of the speakers (name, gender, MP status, party affiliation, party coalition/opposition). The data are structured into time-stamped terms, sessions, and meetings. Speeches are marked by the speakers and their roles (e.g., the chair or regular speaker). The speeches also contain marked-up transcriber comments, such as gaps in the transcription, interruptions, applause, etc. More information about the creation of the corpora, the common standard, and specifics of each national corpus can be found in (parlamint2022).
3.2 The Datasets
At the time of writing this paper, the ParlaMint project released data for 16 languages: Bulgarian, Croatian, Czech, Danish, Dutch, English, French, Hungarian, Icelandic, Italian, Latvian, Lithuanian, Polish, Slovenian, Spanish, and Turkish. In total there are utterances and words. The quality of the textual corpora and metadata varies across the languages. In our experiments, we studied parliaments in six countries: Bulgaria (BG), Czech Republic (CZ), France (FR), Slovenia (SI), Spain (ES), and the United Kingdom (UK). The criteria for this selection were mainly the quality of the provided corpora and that we, as authors, understand the languages and the political situation in these counties. The available data for the specific year varies across the parliaments, as shown in Table 1. We decided to analyze data from 2017 to 2020 expecting this selection to be the most informative and provide the most interesting insights.
The data per parliament and year are organized as the parliament session documents. Every text document with the talks is paired with the document containing the session metadata such as title, time, term, number of session and meeting. This supplement document also includes the speaker information (speaker type, speaker party, party parliament status, speaker name, speaker gender, and speaker birth). The number of parliament session documents per country and year are presented in Table 2. The number of sessions varies per parliament. For the Czech parliament, the number is the largest, while Slovene and Spanish parliament exhibit lower numbers of sessions.
We selected several modern NLP approaches to analyze the speeches in chosen languages. First, in Section 4.1 we describe our topic modeling approach, followed by several classification tasks (gender, age, sentiment, and emotion prediction) in Section 4.2.
4.1 Topic Modeling
Due to its interpretability and availability of visualization tools, Latent Dirichlet Allocation (LDA) (blei2003latent) is still one of the most popular approaches to topic modelling. LDA builds a probabilistic model of topics appearing in a document collection. Without describing mathematical details, we can understand LDA as guided by two principles.
Every document is a mixture of topics. We can imagine that each document contains words from several topics in different proportions. For example, in a two-topic model, we could say, “Document 1 is 90% topic A and 10% topic B, Document 2 is 30% topic A and 70% topic B, etc.”
Every topic is a mixture of words. For example, in a two-topic model of US news, with one topic “politics” and another “entertainment”, the most common words in the politics topic might be “President”, “Congress”, and “government”. In contrast, the entertainment topic could be made up of words “movies”, “television”, and “actor”. Notably, words can be shared between topics; a word like “budget” might appear in both topics with different probabilities.
LDA takes the collection of documents and a number of topics as input. It computes the most likely topic distribution within the documents and word distribution within the topics. The output of the algorithm is the desired number of topics described with the most probable or most representative words. Observing the typical words of a topic, we can interpret the contents of the topic.
Our procedure consists of three steps. Before applying the LDA approach, we first preprocessed and cleaned the data. Second, after applying the LDA, the computed probabilities of topics and words were visualized. Third, we used the visualizations to interpret the topics using the characteristic words. These steps are described below.
4.1.1 Cleaning and Preprocessing
As we were interested in recently appearing topics, we took only the text of parliament sessions between 2017 and 2020. All texts from one parliament constituted a corpus that was cleaned and further prepared for the topic modeling process. Only speeches of full parliament members were considered (i.e. Speaker type =’MP’, Speaker role=’Regular’). The method of the further corpus preparation can be described as follows:
First, we performed a standard text cleaning which contains punctuation removal, lowercase conversion, and stop words removal.
Further, we removed all of the words that are used in parliament discourse and do not bring any insight into the topics of interest. For instance we removed verbs such as: "say", "put", "make", "leave", "come", "see", "speak","pay", "deal", "have", "give", "take", "make", "do" and "get". Also, we removed nouns that frequently repeat like: "time", "friend", "lord", "year", "people" and "government".
Finally, we performed part of speech tagging and kept only nouns, adjectives, and verbs in the corpora.
On the obtained corpora, we applied the LDA method, separately for each language.
4.1.2 Processing and Visualization
To interpret LDA results, we used the LDAvis tool (sievert2014ldavis) which produces one of the most indicative visualizations for the topic modeling. Apart from visualizing relations between topics and topic overlap, the tool also computes the relevance measure, assessing the importance of terms for the selected topic. As depicted in Figures 1 – 6, the visualization consists of two panels:
The left-hand panel maps topics distribution to the 2-dimensional space. The topics are visualized in the form of bubbles where the size of a bubble indicates how strongly represented is this specific topic in the documents. An ideal LDA topic model will produce large, scattered bubbles that do not overlap. Distances between the topics approximate the semantic relationship between the topics. Hence, the topics that share common words overlap or are positioned close together.
The right-hand panel is a bar graph showing the frequency distribution of the words in the documents (blue colour) and in the selected topic (red colour). By choosing a topic (by clicking on a topic bubble), the panel will display the top 10 words (with the red-shaded area). The topics containing that word are displayed in the left-hand panel by hovering over a specific word in the right-hand panel. The size of the bubble in this scenario describes the weight of the word in that topic, i.e. the larger the weight of the selected word, the larger the bubble. The bar graph ranks words in the right-hand panel based on their frequency, but that can be changed by varying the parameter. By decreasing the parameter, one increases the weight of the ratio of the frequency of a word given the topic divided by the overall frequency of the word in the documents. Important words for the given topic move upwards in the ranking. The relevance of term in topic given the user-specified weight parameter is defined as:
Here is the probability of term in topic , the probability of term , and determines the weight given to the probability of term in topic relative to its lift , both expressed on the logarithmic scale. Thus, if we set to 1, the visualization shows only the top-ranked term within a specific topic. On the other hand, decreasing the towards 0, the relative probability of the term within this topic relative to its overall expected probability (i.e. its lift) gets more prominence in the ordering. The authors of LDAvis suggest using based on their practical experience.
The main adjustable parameter in LDA is the number of topics. We aimed to select this number for all six included countries in an equivalent manner. The criterion we followed was separated topics (i.e. well separated bubbles in LDAvis), allowing for unambiguous interpretation. For each of the six countries, we, therefore, created topic models for 5 to 12 topics and selected the number that best fitted the set criterion. The selected number for each parliament is presented in Section 5. Numbers less than five created topic models where each topic was a mixture of many themes; for numbers larger than 12, LDA produced topic models that have a few large and many small topics that are difficult to interpret.
The interpretation of topics is then based on the most frequent and relevant words in each topic (as explained above). As the typical representatives of topics, we selected characteristic words (the ones used more exclusively for a given topic).
To validate the obtained topics, we did a manual topic analysis for one of the languages (Spanish), getting very similar results with much more work.
4.2 Supervised Classification
In our analysis, we use supervised text classification for sentiment and emotion analysis, as well as to predict the age, gender, and political wing of a speaker. Since 2019, a standard approach to text classification has been fine-tuning one of the large pretrained language models such as BERT (devlin-etal-2019-bert) to the specific task. We followed this approach and used multilingual BERT (pretrained on 104 languages) to predict desired variables. The trained models were used to predict speakers’ meta-information (gender, age, political wing), their sentiments and emotions based on the language and contextual information in speeches. The details are presented below.
4.2.1 Meta Data Prediction
Datasets from the ParlaMint project contain information about the speakers’ age, gender, and political party. Thus, we fine-tune the multilingual BERT model to predict each of the three metadata variables from individual speeches of parliament members. The prediction accuracy of the models reveals the amount of information about the metadata stored in the parliament speeches. The variables that we predict are:
Age of the speaker. The original corpora contained the birth year of the speaker, from which we computed the speaker’s age and dichotomized it into two groups using the cut-point of 45 years so that the class variable contains two labels:
young (label=0): speakers under or equal to 45 years;
old (label=1): speakers over 45 years.
To obtain reliable results but limit the computational, we randomly selected speeches from each of the six parliaments, of which were by speakers under 45 and by speakers over 45 years.
Gender of the speaker. The original datasets contain a meta-data variable with the value ’F’ for female and ’M’ for male speakers, which we converted into 0 for females and 1 for males. We randomly selected speeches males and speeches by females.
Political wing of the speaker. We analyzed two settings of this variable: one to distinguish between centrally positioned parties and another to distinguish between extreme political parties. Thus, for all six parliaments, we labeled speeches of speakers from center-left and center-right with 0 and 1, respectively. The same procedure was done for speakers of extreme parties, marking with 0 extreme-left and with 1 extreme-right parties. For both center and extreme comparisons, we selected speeches from the left and speeches from the right-wing of the political spectrum.
Each of the created datasets (altogether 24 datasets were created, i.e. four tasks for each of the six countries) was split into а training (80% of instances) and testing set (20 % of instances). The training set was used to fine-tune the multilingual BERT (mBERT) model444https://huggingface.co/bert-base-multilingual-cased
. BERT is a text representation model based on the transformer neural network architecture(Vaswani2017) pretrained on the masked language modeling task using a large corpus of data. The mBERT is pretrained on 104 languages. In our work, we used the pretrained models available in the HuggingFace platform and fine-tuned them separately for each task.
4.2.2 Sentiment Prediction
For automatic classification of sentiment in text data, various approaches have been developed, the most successful being machine learning classifiers trained on human-annotated corpora. The main challenge of these approaches is that they tend to be domain-specific and work best when trained with labeled data from the target domain but are less effective in other domains. However, as producing labeled datasets is expensive, researchers often apply the trained models across domains and languages. The cross-lingual transfer is possible either by the machine translation from a language without a suitable dataset to a language where such a dataset exists or by using a pretrained multilingual language model such as mBERT.
As there are no specific parliamentary language sentiment datasets, our cross-lingual and cross-domain approach relies on a collection of sentiment datasets from different languages in the domain of news and media. The reason to choose the news sentiment datasets is that the language and the context used are relatively similar to the parliament discourse. We use two-class sentiment prediction with the negative sentiment labelled 0 and positive with 1. The datasets we combined into our training dataset are the following.
Slovenian SentiNews dataset (buvcar2018annotated) is a manually labeled sentiment dataset containing documents there were annotated on a document, paragraph and sentence level. For our task, we used the document level annotation selecting the negative and positive labeled news. The selected instances consist of negative and positive Slovene news.
English news headlines datasets, consisting of two sources:
The financial news headlines dataset (malo2014good) was labelled with the sentiment from the perspective of a retail investor and constructed based on the human-annotated finance phrase bank. The data contained negative and positive headlines.
The SEN dataset (baraniak2021dataset) is a recent human-labelled dataset for entity-level sentiment analysis of political news headlines. The dataset consists of
human-labelled political news headlines from several major online media outlets in English and Polish. Each record contains a news headline, a named entity mentioned in the headline, and a human-annotated label (positive, neutral, or negative). The original SEN dataset package consists of two parts: SEN-en (English headlines that split into SEN-en-R and SEN-en-AMT), and SEN-pl (Polish headlines). The English dataset names are comming from the way the annotation process was done. For SEN-en-R each headline-entity pair was annotated via the open-source annotation tool doccano555https://github.com/doccano/doccano by at least 3 volunteer researchers while for the SEN-en-AMT the Amazon Mechanical Turk service was used. For our task, we selected only the labeled instances from the two English datasets that were annotated as negative and positive, ending with negative and positive instances.
Russian news dataset obtained from the Kaggle666https://www.kaggle.com/competitions/sentiment-analysis-in-russian/data that contains sentiments annotated news in the Russian language. From these, we selected negative and positively labeled instances.
By combining all the above datasets, we obtained our final training dataset with labeled instances, of which there are negative and positive.
4.2.3 Emotion Detection
Similarly to the sentiment analysis, we use the mBERT model to detect emotions in the parliamentary speech. Our preliminary investigation showed that precise detection of many emotions is not possible in the multilingual setting, so we only categorized emotions into positive and negative. We fine-tune the mBERT model with the following four emotion-labelled datasets.
The Kaggle Twitter dataset777https://www.kaggle.com/datasets/pashupatigupta/emotion-detection-from-text contains 13 different emotions and records. We selected happiness, love, hate, and anger tweets. The instance were grouped into negative emotions (hate and anger with in total instances) and positive emotions (happiness and love with in total instances).
The HuggingFace888https://huggingface.co/datasets/emotion Twitter dataset (saravia2018carer) contains annotated tweets. From these, we selected fear (labeled as negative) and love instances (labeled as positive).
GoEmotions999https://ai.googleblog.com/2021/10/goemotions-dataset-for-fine-grained.html dataset (demszky2020goemotions) is a human-annotated dataset of 58k Reddit comments extracted from popular English-language subreddits and labeled with 27 emotion categories. Some comments have multiple emotion labels but we selected only instances with a single labeled emotion. Extracted negative emotions are anger ( instances) and disgust ( instances), while positive emotions are love ( instances) and optimism ( instances). In total, we extracted negative and positive instances from this dataset.
XE101010https://github.com/Helsinki-NLP/XED emotion dataset (ohman2020xed) contains human-annotated Finnish and English sentences. From the English dataset, we selected anger and disgust sentences (in total instances) as negative emotions, and joy sentences as positive emotions.
Our final emotion detection dataset contains instances from which are labeled as containing negative and as expressing positive emotions.
In this section, we report and interpret the obtained results. We present topic modelling results in Section 5.1, prediction of metadata (age, gender, and political wing) in Section 5.2, and sentiment and emotions analysis in Section 5.3.
5.1 Topic Modeling
We analyzed the topics present in each of the six parliaments, using the methodology based on LDA and LDAvis, as presented in Section 4.1. The obtained topics overlap across parliaments to a surprising degree, and we can observe at least three joint topics: health, budget, and parliamentary procedure. In the observed time (2017-2020), a strong topic in all parliaments was health-related due to the COVID19 pandemics. The budget and economy is an annual topic in all parliaments, and we detected it in all national parliaments. Due to the nature of the corpora, the third joint topic is related to parliamentary procedure and legislative process, e.g., law, voting, amendments, articles, parties, etc.
Other topics seem more specific for individual parliaments, e.g., Catalan independence in Spain, trade and Brexit in the UK, financial monitoring in Bulgaria, Sunday work in Slovenia, and fiscal pact and elections in the Czech Republic. Below, we present detailed findings for each of the parliaments.
5.1.1 Topics in Bulgaria
We get the most separated themes using 5 or 9 topics. However, the division of topics into more than 5 topics does not seem to contribute much to finding new relevant themes; therefore, we used 5 topics as shown in Figure 1. Although the names and stop words were filtered during the processing, some irrelevant words still appear. The frequently appearing proper names are informative, e.g., the name of an educational institution refers to the issues of education, the name of the head of the budget committee is related to the budget issues, and a specific party or a party leader is related to specific issues related or raised by that party and/or leader.
Topic 1 is related to financing the public sector, including radio, culture, tourism, waters, sport, and municipalities in relation to financial and law terms like taxes, law, expenses, and concession.
Topic 2 concerns the parliamentary debates between opposition and ruling parties. It is characterized by words such as resignation, elections, citizens, and society.
Topic 3 is related to corruption and includes words gambling, financial monitoring, corruption, Menda Stoyanova (the head of the budget committee). The topic expresses the problems with the gambling bosses not paying their taxes and the practices of corruption.
Topic 4 covers education and health with typical words education, schools, children, social policy, labour, defense, insurances, health, patients, and health fund.
Topic 5 concerns the road infrastructure and media, indicating the long-lasting problems with bad roads in Bulgaria.
5.1.2 Topics in the Czech Republic
In the Czech Republic, five topics produce the most clearly separated themes. The visualization is provided in Figure 2.
Topic 1 covers the organization of parliament sessions such as meetings, interruptions, proposals, vocatives, formulas of protocol, procedure, etc.
Topic 2 covers education and social services with characteristic words schools, children, teachers, parents but also agriculture, pensions, wages, and investments.
Topic 3 is dedicated to economic issues, with typical words being EU, real estate, taxation, businesses, banking, and inflation.
Topic 4 covers health and includes words patients, healthcare, medicine, and vaccination.
Topic 5 includes the parliament organization and procedure.
5.1.3 Topics in France
The best interpretable number of topics in French parliament is 6. The visualization is provided in Figure 3.
Topic 1 covers the economy and social rights. The most relevant words are company, Euro, taxes, pension, fiscal, salary, and social security.
Topic 2 is dedicated to families and education. The most relevant and exclusively used words are person, child, woman, professional, young, education, victim, men, school, family, and parent.
Topic 3 deals with the budget and investment projects. The most relevant words are project, area, finance, millions of Euros, budget, financing, credit, funds, and investments.
Topic 4 is about the legislative process. The most relevant words are amendment, law, article, committee, right, and arrangement.
Topic 5 is related to Topic 4 but is oriented more to the parliamentary debating. The most frequent and relevant words are group, bench, minister, debate, republic, and LaREM (French ruling party).
Topic 6 covers several topics related to different laws passed in the parliament. The most relevant words are country, health, agricultural, European, and hospital.
5.1.4 Topics in Slovenia
In the case of Slovene parliament, several numbers of topics give similarly interpretable results but using six topics produces well separated themes with clear interpretation as presented in Figure 4.
Topic 1 contains budget discussions. The most relevant words are millions, budget, investments, year, and economy.
Topic 2 is dedicated to the debate around social rights and the Sunday work of shops. The most relevant and exclusively used words are help, children, left, security, shops, workers, employed, and Sundays.
Topic 3 deals with the legislative process. The most relevant words are law, proposal, amendment, party group, and article.
Topic 4 is about COVID-19 epidemics and the purchase of medical equipment. It contains words: epidemics, health, institution, COVID, goods reserves, health services, protective equipment, and retirement homes.
Topic 5 contains discussions with the government president and ministers. The most frequent and relevant words are government, mister, president, DUTB (bank loans agency), session, parliament, and ventilators.
Topic 6 contains words frequently appearing in parliamentary discussions, such as addresses, opening sessions, etc. It covers no specific contents. The most relevant words are later, now, here, I think, next, said, a little, have, nothing, you know.
5.1.5 Topics in Spain
The best separated and interpretable number of topics in the Spanish parliament is 5, and even for this number of topics, two themes (numbered 1 and 5) overlap as the Figure 5 shows.
Topic 1 is politics-related, including words such as ruling, government, various political leaders, the far-right, Spain, and Catalonia.
Topic 2 is health-related and includes words nurses, virus, (primary) care, rights, masks, disease, family, hospitals, disability, equality, doctors, and army.
Topic 3 is cost-related with characteristic words: budget, taxes, income, debt, deficit, austerity, public services, pensions, salaries, gross domestic product, (un)employment, and self-employed workers.
Topic 4 is mobility-related with typical words environment, climate change, bills encouraging the use of new vehicles, and enterprises supporting the paradigm change.
Topic 5 is law-related and includes words judges, the penal code, jurisprudence, reform, amendment, the patient’s rights, and justice.
5.1.6 Topics in UK
The best separated and interpretable number of topics in the UK parliament is five. The visualization is presented in Figure 6. This split produces the following themes:
Topic 1 is finance-related and includes words employment, workers, pensions, funding, taxes, banks, investment, economic and health crises, National Health System, construction, and transport.
Topic 2 is trade-related and is characterized by exportation and importation, (Welsh) food, fishing, farming, tourism, university education, and aviation.
Topic 3 is family-related with words children, women, the young, parents, abuse, vulnerability, education, and health.
Topic 4 is security-related and typical words are crime, bills, amendments and regulation, law enforcement, offenders, prisoners, and lawyers.
Topic 5 is politics-related and includes words democracy, parliament, speech, leader, opposition, election, revolution, referendum, and peace.
5.2 Metadata Prediction: Age, Gender, and Political Wing
As we described in Section 4.2, we find-tuned the multilingual BERT language model to predict speaker’s metadata such as age, gender, and political position. The mBERT model was fine-tuned for each of the four variables and six countries separately, and we present the predictive performance measured on the testing datasets. Being able to predict any of these three variables indicates big differences in the language used by specific groups of parliamentary speakers. The differences and similarities between different countries are discussed below.
5.2.1 Predicting the age of speakers
Table 3 shows that age is a relatively well-predicted characteristic of speakers in Spain, Bulgaria, and Slovenia, a bit less so in the Czech Republic, while in France, there are very few language differences between speakers of different age. The higher the prediction performance, the easier it is to distinguish between speakers’ age groups, and the more significant is the generation gap.
Below we try to explain the two extreme cases, Spain with the largest gap between age groups and France with the smallest.
flaherty1987langue presents a historical development of French political discourse, which is directed toward uniformity in discursive strategies and may explain their similarities. A similar conclusion was drawn by lehti2014style who show that the language used in French politicians’ blogs is relatively standard.
The Spanish case, with the largest differences between younger and older parliamentary speakers, may be explained by the fact that after the end of the two-party system, new parties with younger leaders wanted to contrast with more senior and more socially privileged individuals (cameron201110).
5.2.2 Predicting the gender of speakers
As discussed in Section 2.4, the information about speakers’ gender may be detected from the grammatical structures used in their speech for all the analyzed languages but English if speakers use phrases related to their personal beliefs and feelings. Another possibility to detect gender is if speakers of different gender indeed use different language.
As Table 4 shows, gender is detectable to some degree in all analyzed countries. Slavic language (Slovenian, Czech, and Bulgarian) speakers express their gender the most explicitly, followed by Spanish, English and French speakers. The last two (English and French) are surprising for different reasons. In French, where gender may be expressed with the language, there is little evidence that speakers express it. Similarly to the age, we hypothesize that the case of French could be explained by the tendency toward language uniformity in French political discourse. Contrary to that, in English, where gender expression is not part of the grammar, the speakers’ gender can be detected nevertheless, indicating differences in expression between male and female MPs.
5.2.3 Predicting the political orientation of speakers
This section investigates the speech differences between parliament members with different political orientations. Our approach is again based on prediction models that predict the metadata (party membership) available for speakers. A successful prediction would testify that speakers of different political orientations use different language, while low success in prediction would indicate that the compared parties use similar discourse. We investigate two scenarios of different difficulty:
Predicting the left/right positioning of speakers from firmly or extreme left and right political parties. This problem shall not be very difficult, as we expect significant differences in the political stance between these parties, which we assume will be expressed in different content and possibly other linguistic features. The results are presented in Table 5.
Predicting the left/right positioning of speakers from the center-left and center-right political parties. This shall be a more complex problem as we try to distinguish between speakers from relatively similar parties. The results are presented in Table 6.
As expected, the differences in speech between extreme left- and right-wing parties are relatively well predictable for all countries, indicating big differences in the discourse of these parties. The classification accuracy between countries ranges from 88% (Czech Republic) to 74% (France).
Surprisingly, the differences are still large between center-left and center-right parties, ranging from 87% (Slovenia) to 54% (France). France is an exception with its low predictability (again, likely due to the tendency for uniform political discourse), which is much more prominent in other countries. For two countries, Slovenia and Bulgaria, the differences between central parties are larger than between extreme parties, which may indicate strong political competition between the central parties.
5.3 Sentiment and Emotions Detection
This section presents the results obtained from the sentiments and emotion detection experiments. For these experiments, we fine-tuned multilingual BERT on the training datasets described in Section 4.2
. First, we try to establish the quality of the trained models. For that purpose, during the fine-tuning process, a small part of the training data (10 % of all training instances) was used for the validation after each training epoch. This classification accuracy is shown in Table7 for both the sentiment and emotion detection tasks. The results show that sentiment and emotions can be relatively well-predicted, which is a positive indication of the reported results’ reliability. Slightly better results in predicting positive and negative emotions are expected, as emotion datasets are all in English, all collected from social media, and therefore relatively homogeneous. The sentiment detection datasets are multilingual and collected in different domains; thus, the training accuracy reported in Table 7 is expected to be lower. However, this does not mean that the obtained sentiment model would provide less good generalization on our out-of-domain parliamentary data.
To further assess the quality of the produced sentiment prediction model, we selected 20 talks with the highest probability of the negative sentiment for each of the parliaments and manually validated weather predictions are correct. The results are presented in Table 8. Based on the results, we consider the model’s accuracy good enough (and comparable to other sentiment prediction models in the literature) to provide a reliable picture of the sentiment in our study. The fine-tuned models were used on our six parliamentary speech datasets. To obtain reliable and comparable statistics, we randomly selected speeches that have more than 30 characters of regular parliament members from 2020 for each of the six parliaments. For each speech, the trained mBERT model returned the sentiment score between 0 and 1 (0 indicating the negative and 1 indicating the positive sentiment).
Predictions are summarized in Figure 7 and Table 9 from which we can compare the parliamentary sentiment across the countries. Figure 7 shows the histogram of sentiment distribution in each country. The Czech, Spanish and United Kingdom parliaments seem to express less negative sentiment than positive; in the Bulgarian and French parliaments, there seem to be a relatively balanced situation, while the Slovenian parliament shows the least positive sentiment. We attribute the results for Slovenia to the poisonous exchanges between the pro-government and opposition parties at the observed time when the previous opposition took over the government in the middle of the mandate. To get a numeric overview of the sentiment, we set the decision threshold for negative sentiment at 0.2 and for positive sentiment at 0.8 and counted the number of negative and positive speeches. The results are presented in Table 9. As before, we conclude that the parliament with the highest percentage of negative sentiment is Slovenian, while the UK parliament speeches contain the highest positive sentiment rate.
Similarly to sentiment, we process emotions. To validate how good the emotions detection models are, we selected 20 speeches predicted to be the most negative for each parliament and manually checked if predictions were correct for them. The results are presented in Table 10. We can observe significantly lower accuracy in all countries compared to the sentiment (shown in Table 8). While this is not surprising as the emotion prediction is considered harder compared to sentiment, this makes the results and interpretations presented below less reliable compared to the sentiment.
We show the results for the emotion detection in Figure 8 (distribution of sentiment predictions) and Table 11 (the percentage of positive and negative emotions taking 0.2 and 0.8 as the decision threshold values). As the results show, positive emotions are strongly dominant in all countries except France and UK, where positive and negative emotions are almost balanced.
6 Conclusions and Further Work
We presented the mono- and cross-lingual methodology for the analysis of parliamentary speeches that can be applied in a uniform way to the parliaments in the ParliaMint corpora collection (and other parliamentary datasets with similar information). Our methodology covers topic modelling, analysis of sentiment and emotions, as well as prediction of metadata such as age, gender, and political orientation of the speakers. The source code of the developed methods and evaluation scenarios is publicly available111111https://github.com/KristianMiok/Parliamentary-Discourse. We demonstrate the presented methodology on six national parliaments showing similarities and some surprising differences between them.
Interpreting the results required interdisciplinary collaboration and understanding of language and political situation in the countries tackled. Our topic analysis showed considerable overlap between the observed countries. The joint topics are health, budget, and parliamentary procedure. Each of the analyzed parliaments also exhibits some specific topics. We discovered that in all countries except France, the age and gender of speakers is a strong factor in the political discourse. Further, we found a big difference in the discourse between extreme left- and right-wing parties in all analyzed countries. Surprisingly, there is also a considerable difference between center-left and center-right parties in all countries except France. The sentiment analysis shows considerable differences between parliaments. The Czech, Spanish and United Kingdom parliaments express less negative than positive sentiment, the Bulgarian and French parliaments have a balanced distribution, and in the Slovenian parliament, the negative sentiment dominates. The situation is different with the emotions, where positive emotions are strongly dominant in all countries except France and UK, where positive and negative emotions are almost balanced.
There are many open avenues for further work. A larger analysis of all 16 parliaments in the ParliaMint collection would require a much larger research team but would produce a very interesting comparison between the parliaments. The proposed methodology could be extended with other topic models and better training datasets for sentiment and emotions when they become available. We could analyze a broader spectrum of emotions, but currently, existing datasets are inadequate for our purpose due to differences in the covered domains.
This work is based upon the collaboration in the COST Action CA18209 – NexusLinguarum “European network for Web-centred linguistic data science”, supported by COST (European Cooperation in Science and Technology). Marko Robnik-Šikonja received financial support from the Slovenian Research Agency through core research programme P6-0411 and projects J6-2581 and J7-3159. Encarnación Hidalgo Tenorio was financially supported by the European Social Fund, the Andalusian Government, and the University of Granada (Project References: A-HUM-250-UGR18 & P18-FR-5020). Petya Osenova was partially supported by CLaDA-BG, the Bulgarian National Interdisciplinary Research e-Infrastructure for Resources and Technologies in favor of the Bulgarian Language and Cultural Heritage, and partially through the EU infrastructures CLARIN and DARIAH, Grant number DO01- 377/18.12.2020.
Compliance with Ethical Standards
The authors declare that they have complied with the ethical standards in their research.
Conflict of Interest The authors declare that they have no conflict of interest.
Ethical Approval This article does not contain any studies with human participants or animals performed by authors.
Informed Consent Informed consent was not required as no humans or animals were involved.