Log In Sign Up

Political Text Scaling Meets Computational Semantics

During the last fifteen years, text scaling approaches have become a central element for the text-as-data community. However, they are based on the assumption that latent positions can be captured just by modeling word-frequency information from the different documents under study. We challenge this by presenting a new semantically aware unsupervised scaling algorithm, SemScale, which relies upon distributional representations of the documents under study. We conduct an extensive quantitative analysis over a collection of speeches from the European Parliament in five different languages and from two different legislations, in order to understand whether a) an approach that is aware of semantics would better capture known underlying political dimensions compared to a frequency-based scaling method, b) such positioning correlates in particular with a specific subset of linguistic traits, compared to the use of the entire text, and c) these findings hold across different languages. To support further research on this new branch of text scaling approaches, we release the employed dataset and evaluation setting, an easy-to-use online demo, and a Python implementation of SemScale.


page 1

page 2

page 3

page 4


Adaptive Ensembling: Unsupervised Domain Adaptation for Political Document Analysis

Insightful findings in political science often require researchers to an...

Gaining Insights on U.S. Senate Speeches Using a Time Varying Text Based Ideal Point Model

Estimating political positions of lawmakers has a long tradition in poli...

"Read My Lips": Using Automatic Text Analysis to Classify Politicians by Party and Ideology

The increasing digitization of political speech has opened the door to s...

Isabelle/jEdit as IDE for Domain-specific Formal Languages and Informal Text Documents

Isabelle/jEdit is the main application of the Prover IDE (PIDE) framewor...

Scaling Text with the Class Affinity Model

Probabilistic methods for classifying text form a rich tradition in mach...

Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach

This study analyzes the political agenda of the European Parliament (EP)...

1 Introduction

The development of a variety of models for inferring policy position of actors directly from textual evidence these actors produce has expanded the scope and focus of analyses in political science and has sustained the growth of the text-as-data community (Laver et al., 2003; Slapin and Proksch, 2008; Lowe et al., 2011, inter alia). The so-called text scaling approaches, such as the widely popular Wordscores (Laver et al., 2003) and Wordfish (Slapin and Proksch, 2008) algorithms, offer the possibility of identifying latent positions of, for instance, parties directly from the electoral manifestos they present or the speeches their members deliver. Such positions, depending on the type of data and the context of the study, have been interpreted as being able to capture a left-right scale or different attitudes towards the European integration process (see, for instance, Laver et al. (2003) and Proksch and Slapin (2010)).

While these text scaling methods show potential in considering textual content directly as a form of political data that can be automatically analyzed, it is important to notice that they suffer from a major limitation. Namely, they treat textual data in a symbolic

fashion, i.e., they represent documents simply as bags of words and assign them (explicitly or implicitly) position scores depending on the words they contain. This means that the amount of lexical overlap between two texts directly determines the extent of their positional (dis)agreement. This gives rise to two types of errors in position estimation that methods based on lexical overlap are prone to:

  1. Texts conveying similar meaning and expressing a similar political position, but overlapping in very few words (e.g., “…homophobic outbursts should have no place in modern German society.” and “…anti-gay propaganda needs to be prevented.”) will end up being assigned very different position scores.

  2. Texts conveying different or opposing political positions, but having a significant word overlap (e.g., “Migrants are responsible for the increased crime rates.” vs. “Migrants are responsible for fewer crimes than domicile population.”) will end up being assigned similar position scores.

In other words, given the virtually unlimited expressivity of the natural language, similar political positions may be lexicalized very differently, but also different political positions may be lexicalized similarly. In this work, we propose a scaling approach that remedies for the above-mentioned limitations of existing scaling methods by considering semantic representations of the words in the text. In general, semantic word representations

are computational representations (e.g., vectors) that have the the following property: words with similar meaning (e.g.,

“gay” and “homosexual”) have similar representations; conversely, words with a distant meaning (e.g., “propaganda” and “cheese”) should have dissimilar computational representations. Our semantic scaling algorithm, dubbed SemScale, leverages recent developments from the area of computational linguistics where methods for inducing robust algebraic representations of word meaning have been proposed (Mikolov et al., 2013; Pennington et al., 2014; Bojanowski et al., 2017, inter alia). By relying on semantic rather than symbolic representations of text, SemScale recognizes different words and phrases with similar or related meaning (e.g., that “homophobic outbursts” has a similar meaning as “anti-gay propaganda”) and uses such semantic similarities to produce the scaling scores. Additionally, SemScale is a fully deterministic algorithm, which contributes to addressing the issue of consistency and reproducibility of results obtained via text mining approaches.

Another often criticized aspect of existing unsupervised scaling methods is their inability to decipher which (if any) underlying policy dimension is captured by the produced position scores (cf. for instance, the critiques raised by Budge and Pennings (2007a, b) concerning Wordscores or the recommendation made by Proksch and Slapin (2010) with respect to filtering ideological from non-ideological statements prior to applying Wordfish). In addition to the above criticism, Denny and Spirling (2018) have recently questioned the robustness of Wordfish, demonstrating that it is very sensitive to the smallest changes in the input text, such as the removal of punctuation or stopwords, which should have no effect on the overall political message (i.e., position).

To address this issue, in our work we examine the robustness and stability of text scaling methods by investigating the effects that different lexical and semantic information extracted from the content play when used as input to the algorithms. We are particularly interested in a better understanding of the extent to which a) known policy positions are captured by specific linguistic traits, in contrast to the usage of the entire texts, and b) whether this is further emphasized by our newly proposed scaling approach, which is aware of the meaning of the words under study (and not just of their frequency). To this effect, we extensively empirically compare two different unsupervised text scaling algorithms: (1) Wordfish that is widely adopted among political scientists, which operates on the basis of word frequencies and is unaware of word meaning and (2) SemScale, our new semantically-aware scaling method, which exploits distributional semantic representations of the texts under study. In order to make an empirical comparison across different languages (English, German, French, Italian, and Spanish), we have obtained a textual dataset for scaling from the European Parliament website. It comprises all speeches given in one of the above five languages by members of the European Parliament (MEP) and their official translations to all of the other languages during the 5th and 6th legislative terms. For each legislation and language, we concatenated all speeches of all members of the same national party in a single textual document, aiming to discover the overall political positions of the party during the legislation. Our dataset creation builds on the work of Proksch and Slapin (2010) and can be seen as an extension of the dataset used in their work, which covered only the speeches produced in or translated into English, German, and French and only during the 5th legislative term.

The contribution of this study is threefold. First, we propose a novel algorithm for unsupervised text scaling based on semantic text representations and demonstrate empirically that it outperforms the widely adopted WordFish algorithm in terms of identifying party positions on European integration. Second, we show that specific lexical and semantic information allow to consistently capture underlying political dimensions of interest from textual data; the significance of our results is emphasized by the fact that our findings consistently hold across all five major European languages. Finally, we present an online appendix (Nanni et al., 2019) with the multi-language dataset employed in this study (in its original form and after each pre-processing step) and all the results obtained during our work, a demo111 that offers the possibility of directly testing SemScale through an easy-to-use graphic interface, and a Python implementation of SemScale (usable as a command-line tool)222 for encouraging further research and collaborations on semantically-aware text scaling methods.

The structure of the article is as follows. In the next section, we present an overview of lexical and semantic features that can be extracted from text in the languages under study; we analyze the impact of each of them on the scaling process in our empirical analysis. Section 3 provides a detailed description of SemScale, our newly proposed scaling algorithm that exploits semantic representations of words and texts. We highlight differences and commonalities with respect to the widely adopted WordFish algorithm. In Section 4, we describe the process of compiling a dataset of European Parliament speeches, covering five languages and two legislative terms, which we use for evaluating the effectiveness of the analyzed unsupervised scaling methods. In Section 5 we display the results of our empirical scaling evaluation, emphasizing how lexical and semantic features crucially impact the scaling performance and that our findings are consistent across all included languages. We conclude by discussing our findings and possible implications they might have in fostering further research on semantically-aware analyses of political texts.

2 Linguistic and Semantic Features

The overall purpose of using natural language processing (NLP) techniques in our study is the ability of identifying specific syntactic or semantic traits in a string of text that can be further employed by a scaling model as features. In the rest of the section, we describe some of the most common NLP text pre-processing approaches, which we adopt in this research. Before concluding, we additionally overview the tools we have used in our study that have been chosen to make the analysis as comparable as possible across the five different languages we consider in this work.

Parts of Speech.

The computational linguistics community has put in a large effort into developing systems capable of modeling the use of parts of speech for words in text (e.g. articles, nouns, verbs). While older generations of part-of-speech (POS) tagging models have been based on traditional sequence labeling algorithms such as Hidden Markov Models (which assume the sequence of words to be generated by a Markov process having parts of speech as latent states that need to be uncovered) or Conditional Random Fields

(Lafferty et al., 2001)

, state-of-the-art POS-tagging models are commonly built upon deep neural networks.


For instance, bidirectional recurrent neural networks or residual convolutional networks, cf.

Goodfellow et al. (2016)

for a comprehensive overview of deep learning architectures.

In this work, in order to test the effects that different part-of-speech have on the scaling procedure, we have used taggers to filter NOUNS, VERBS, and ADJECTIVES in texts (for each of the five considered languages).

Lemmas. A common practice in computational text analysis is to uniform the texts under study and reduce the overall vocabulary via morphological normalization, a process of reducing all different morpho-syntactic forms of the same word (e.g., “house”, “houses”, “housing”) to some common form (e.g., “house” or “hous”). The most common types of morphological normalization are (1) stemming (e.g. Porter, 1980)

, which strips suffixes from the word based on a series of heuristics and pre-defined rules and (2) lemmatization, which reduces different forms of the word to its canonical form (e.g., cases of nouns to singular nominative or different conjugations of a verb to its infinitive form). Stemming has already been shown to have a negative impact on automatic text scaling

(Greene et al., 2016; Denny and Spirling, 2018) as it may add lexical ambiguity (e.g., “party” may be stemmed to “part”, which holds a different meaning).

Lemmatization, instead, has the goal of normalizing inflected word forms to their canonical forms called lemmas (e.g., “parties” to “party” or “voted” to “vote”). Lemmatization is often performed by a look-up in a dictionary of inflected forms using the inflected word and its part-of-speech as look-up keys (e.g., the POS information helps to transform “meeting” to “meet” when used as a verb and leave it unchanged when used as a noun). While lemmatization does not seem to have a significant effect on scaling accuracy (we obtained similar correlations with expert positions when using lemmatized and when using non-lemmatized texts), it helps us in reducing the overall word vocabulary and consequently speeds up the automatic scaling.

Named Entities.

When dealing with large amounts of text documents, a useful strategy for finding relevant pieces of information is to highlight the named entities that are mentioned in it. Similar to part-of-speech tagging, named entity recognition (NER) is a sequence labeling task, which means that a label about being a part of a named entity (e.g., of type PERSON, ORGANIZATION, or LOCATION) needs to be created for every word in the textual sequence. As for POS tagging, previous generations of named entity recognition systems were also based on machine learning models such as Hidden Markov Model and Conditional Random Fields, whereas the most recent NER approaches are again typically deep neural networks

(see, for instance, Ganea and Hofmann, 2017) . In any case, a large corpus of text manually annotated with named entities is required to train a reliable NER model.

Disambiguated Named Entities. While a named entity recognizer is able to identify mentions of people, organizations, and locations in text, it does not resolve the identity of the entities behind these mentions. For example, a named entity recognizer will extract mentions, such as “President Obama”, “Barack Obama” or “Obama”, but will have no understanding that these mentions correspond to the same real-world entity (i.e., the 44-th president of the United States). Resolving the identity of extracted mentions is the task known in natural language processing as entity linking (EL). Thanks to the efforts of the Semantic Web community in building large-scale knowledge bases such as DBpedia (Auer et al., 2007) or BabelNet (Navigli and Ponzetto, 2012), these resources are now commonly used to ground the mentions of entities from text. In that case, “President Obama” and “Barack Obama” would be linked to the same DBpedia entry dbr:Barack_Obama when they are used for referring to the former US President. Current EL models exploit the lexical and semantic (i.e., other already disambiguated entities) context surrounding the mention of an entity in order to perform disambiguation (e.g., whether a mention “Obama” refers to dbr:Barack_Obama or dbr:Michelle_Obama).

While the potential of resources such as DBpedia or BabelNet is evident,444Semantic Web resources are also becoming components of political science data sets, see for instance the use of DBpedia for disambiguating leader’s names in Archigos: it is also important to keep in mind that they are built upon the strong assumption that the information contained in Wikipedia is reliable. Moreover, one must be aware that despite their immense size existing knowledge bases still have limited coverage of the real world and not all entities that will be mentioned in the texts have corresponding entries in the knowledge base. Acknowledging such current limitations, which we will further address when examining the results, in this research we are nevertheless interested in quantifying the impact that disambiguated named entities might have as features for unsupervised scaling algorithms.

Distributional Semantics. While existing scaling algorithms (e.g., Wordfish) model words simply by considering their weighted frequency, modern research in computational linguistics primarily represents words as numeric vectors from a multidimensional vector space. The research area of distributional (lexical) semantics, in particular, builds upon the assumption that the meaning of a word can be derived by looking at the contexts in which it is used or, as Firth (1957) put it, that “a word is characterized by the company it keeps”. For instance, if we consider the sentence “The members of +Europa voted against the proposal”, even if we don’t know what +Europa

is, we can induce from the context that it is probably a political entity.

The ability to efficiently and precisely capture the meaning of words by representing them as points in a multi-dimensional semantic vector space, i.e., by represeting words with the so-called word embeddings (Mikolov et al., 2013), is arguably one of the most relevant achievements of computational linguistics in the last few decades. Among other things, word embeddings can be used to detect particular semantic relations that hold between words (e.g., hypernymy, syononymy, antonymy, or meronymy) (Glavaš and Ponzetto, 2017; Glavaš and Vulić, 2018) or between entities (e.g., “being capital of”, “being president of”) (Nickel et al., 2016; Joulin et al., 2017). What is more, following Frege (1953)’s context principle of compositionality, namely that the meaning of a complex expression (e.g., a sentence, paragraph, or a document) is determined by the meanings of its elements (the words), it has been shown possible to efficiently represent the meaning of a longer units of text (paragraphs and document) by aggregating (e.g., by computing the weighted average) embeddings of words that the text contains (Shen et al., 2018).

For these reasons, in this work we also examine the potential of distributional semantics for obtaining semantic vector representations of political texts.555Word embeddings have been already studied and employed in political science analyses, for instance by Gurciullo and Mikhaylov (2017) and by Spirling and Rodriguez (2019). While distributional representations of political texts cannot be incorporated as features into the Wordfish algorithm, as WordFish requires symbolic text representations (i.e., words) as input, we use them in our newly proposed SemScale scaling algorithm that we describe in detail in the following section. In our empirical analysis, we have considered both general-purpose word embeddings, obtained by using the entire Wikipedia plus other online resources as training corpus, as well as, domain-specific word embeddings directly learned through all European Parliament speeches.666In both cases we used 300-dimensional word embeddings. While embeddings trained on Wikipedia capture more precisely representations of named entities, the ones generated on in-domain materials are generally more robust across different languages. For this reason, in the main experiments we report results using the in-domain embeddings. In the appendix we also offer the results obtained using embeddings generated from Wikipedia. In the experiments in which we use lemmas instead of the words themselves, we obtain lemma embeddings by first (1) lemmatizing all European Parliament speeches (i.e., we create the lemmatized in-domain text corpus) and then (2) running a word embedding algorithm on the previously lemmatized corpus.

Tools Adopted. One of the main goals of our empirical methodology was to use computational approaches that are as comparable as possible across different languages; for this reason, whenever possible, we adopted the same infrastructure, models, and tools of linguistic analysis for each of the five involved languages. For what concerns part-of-speech tagging, lemmatization and named entity recognition we employed Spacy,777 a recently released Python library that offers robust pre-trained models for all five languages under study.888We initially considered using Stanford CoreNLP by Manning et al. (2014), a more widely adopted natural language toolkit. However, we found models for all required tasks – POS-tagging, lemmatization, and NER – only for English, Spanish, and German. For entity linking we employ DBpedia Spotlight by Mendes et al. (2011), the only openly available entity linking tool offering pre-trained entity linking models for all five languages under study. For computing word embeddings we used the FastText word embedding tool (Bojanowski et al., 2017). We make all the resources and pre-processed datasets available for further research on the online appendix.

Before we move to the next section, we would like to remark that while the above-mentioned tools are widely adopted by both the academic and industrial computational linguistics communities, their performance, especially, for more complex tasks (NER, EL) is far from optimal.999As also documented by Spacy itself: Nevertheless, with the aim of opening the discussion and motivating further research efforts on using semantic properties of text for political text scaling, we have still employed these models with the awareness of their current limitations. By demonstrating that even with their current, sub-optimal performance these models can significantly contribute to the scaling quality, we deliver a promise of further improvements to political text scaling with the future advances in computational linguistics.

3 Scaling Models

To examine the role of lexical and semantic information in the unsupervised scaling process we consider two conceptually different scaling approaches: the first is the popular Wordfish model of Slapin and Proksch (2008), which exploits only symbolic text representations, whereas the second is SemScale, our newly proposed algorithm which is able to exploit semantic representations of texts (i.e., embeddings). We first briefly describe the key aspects of WordFish, after which we describe in detail SemScale. Finally, we summarize the preprocessing steps we employ before feeding the texts to the two scaling algorithms.

Wordfish. It is a variant of a Poisson ideal point model where for each word j from a document i Wij

is drawn from a Poisson distribution with rate

, which is modeled considering the document length (, the token-frequency (), the level to which a token identifies the direction of the underlying ideological space (), and the underlying position of the document ():

This is a completely symbolic approach to text scaling that relies only on token frequency information for determining the position of documents on a single dimension. The method is directly applicable in any language precisely because it does not explicitly model semantics but adopts token rates as a (often very successful) proxy. In this work, we have used the Quanteda implementation of the algorithm.101010

SemScale. This is our new scaling algorithm that exploits distributional semantics representations of political texts.111111An earlier version of the algorithm, with further technical details and an extension for cross-lingual text scaling, is described in Glavaš et al. (2017). We start by representing each document under study with its respective distributional semantic vector, built by aggregating the embeddings of the words the document contains as follows: let be the bag of words of a political speech, i.e., the set of all words that appear in that text and let be the word embedding of some word . We then compute the embedding vector of the whole text, , by computing the weighted average of the embeddings of all words in :

where , standing for the term frequency-inverse document frequency score for the word and document , is the weight with which we multiply the embedding vector of the word . The tf-idf score of the word for the text is the product of two scores – term frequency score (TF), which captures how often the word appears in the document and the inverse document frequency score (IDF), which is inversely proportional to the number of other texts in the collection that contain the word .121212The intuition behind the tf-idf weighting scheme is that the word contributes more to the overall meaning of the text the more frequently it appears in the document (TF component) and the less common it is, i.e., the lower the number of other texts that contain that same word is (IDF component). Precisely, we compute the TF score for a word w and text document T as follows:

where is the raw frequency of occurrence of in , normalized by the maximal frequency with which any other word () appears in . We compute the IDF for each word as follows:

where is the collection of textual documents (and is the number of documents in the collection) and is the subset of the documents in the collection that contain the word .

Then, let , , , be the collection of political texts which we want to scale, with their corresponding distributional semantic vectors , , , , computed from word embeddings as described above. We can then, for any two of these texts and measure the semantic similarity between them by comparing their respective embeddings, i.e., by comparing with . Following common practice with respect to vector-space text representations, we measure the semantic similarity between two texts and as the cosine of the angle that their respective embedding vectors enclose:

where is the dot product between vectors and and denotes the Euclidean norm of the vector . By computing the above similarity for every possible pair of texts in our collection131313In a collection of texts there are different text pairs, i.e., we need to compute similarity scores., we give rise to a fully-connected weighted graph,141414A fully-connected weighted graph is a graph in which there is an edge between every two vertices and there is a numeric weight assigned to each edge. which we call the similarity graph. The vertices in the similarity graph denote individual texts of our text collection (i.e., vertex corresponds to the text ), whereas the weights of the edges denote how semantically similar the two texts are (i.e., the weight of the edge between vertices and is ).

The scaling algorithm we describe next aims to assign a position score to each vertex in the graph, by taking into account the weights of the edges that connect that vertex with all other vertices, that is, by considering the semantic similarity of the corresponding text with all other texts in the text collection . We start from an intuitive assumption that a pair of least semantically similar (i.e., most dissimilar) texts corresponds to extreme positions in the position spectrum. In other words, among all possible pairs of texts (, ), we identify those two that have the lowest mutual semantic similarity (i.e., lowest ) and assume that one of them is one end of the position spectrum, whereas the other, is on the opposite end; positions of all other texts are assumed to lay somewhere in between these two extremes. We name these two most dissimilar texts pivot texts and assign an initial position score of to one of them and to the other. We next propagate the position scores assigned to the pivot texts to all the other text (which are still without a position score), using the structure and the weights in the similarity graph as the backbone for score propagation. Namely, we employ the so-called harmonic function label propagation (HFLP) algorithm, proposed by Zhu et al. (2003)

– a commonly used algorithm for graph-based semi-supervised learning – to propagate position scores from the two pivot texts to other, non-pivot texts. Let

be our similarity graph and its weighted adjacency matrix. Let be the diagonal matrix with weighted degrees of graph’s vertices as diagonal elements, i.e., , where is the weight of the edge between vertices and . Then is the unnormalized Laplacian of the graph , a matrix representation of the graph which can be used to detect many useful properties of . Assuming that the labeled vertices – the vertices to which we have assigned a position score, i.e., the two vertices corresponding to pivot texts – are ordered before the unlabeled ones (vertices corresponding to all other texts in our collection), the Laplacian matrix of the similarity graph can be partitioned as follows:

The vector containing the scores of the unlabeled vertices (which are vertices corresponding to all but the two pivot texts), capturing the position scores of the non-pivot texts, is then computed as:

where is the vector of scores of labeled vertices, in our case the vector with the scores of pivot vertices, . This way, by propagating the position scores from pivot vertices to all other vertices through exploitation of the structure of the similarity graph , we obtain the position scores for all texts in our text collection.

It is worth mentioning that, same as WordFish, SemScale produces a spectrum of position scores, but cannot tell the orientation of the scale. For example, given the left-to-right ideological scaling, we do not know whether the leftmost point on the scale produced by SemScale corresponds to the political party which is most to the left in the political spectrum or to the political party which is most to the right.

SemScale is a fully deterministic algorithm, assuming a fixed collection of pre-trained word embeddings. In other words, if using the same pre-trained word embeddings, SemScale will always produce the same output (i.e., same positions for texts) given the same input (the same collection of texts). In contrast, various WordFish implementations all obtain model’s parameters via stochastic optimization methods, which may lead to somewhat different results being produced by multiple runs on the same data input.

Text Preprocessing. Recently, Denny and Spirling (2018) have highlighted how virtually any pre-processing step has a major impact over the scaling process based on Wordfish. For this reason, we have decided to first evaluate both WordFish and SemScale on original texts, without any pre-processing (i.e., without any removal of punctuation or certain words). Thus, we feed texts as they originally appear to both algorithms. While retaining the original texts without any filtering might not be the optimal setting for either of the algorithms, it allows us to compare the capabilities of the two scaling methods in isolation, avoiding the risk of attributing performance differences stemming from some (rather arbitrary) text pre-processing steps as advantages or shortcomings of either of the algorithms. Also, in all other experimental settings, in which we retain only some subset of the original texts (e.g., only NOUNS or only named entities), we explicitly make sure that both scaling algorithms receive exactly the same textual input.

4 Dataset: European Parliament Speeches

Legislation # Parties Min. Length Mean Length Max. Length
5th (1999–2004) 25 16k 157k 543k
6th (2004–2009) 25 11k 100k 319k
Table 1: Statistics of the datasets adopted in terms of number of words (computed on English data).

In our work, we follow the experimental design adopted by Proksch and Slapin (2010) when testing the Wordfish algorithm in different languages (English, French and German). As in their work, we collect speeches from the European Parliament website. We decided to extend the resource and the experimental setting used in this previous work, to check the validity of our findings across more languages (we add Italian and Spanish) and legislations (5th and 6th). To do so, we first crawled all individual speeches of all European Parliament representatives regarding the legislations under study from the official website of the European Parliament,151515 which cover 10 years of European politics (1999-2009). These are the only two legislations where the transcript of the speeches are available online and the majority of them have been consistently translated.161616To know more, see the European Parliament decision of 20 November 2012 on amendment of Rule 181 of Parliament’s Rules of Procedure concerning verbatim reports of proceedings and Rule 182 concerning the audiovisual record of proceedings. As opposed to Proksch and Slapin (2010), which considered all speeches from all MEPs in the English, French and German translations, in our work we only keep speeches that have been originally addressed in one of the five languages under study and translated to all the others. This permits us to build a perfectly comparable setting across five languages, avoiding the issue of not always having translations available in all languages.171717This, however, produces a dataset which is different from the one used by Proksch and Slapin (2010) and by our original work (Glavaš et al., 2017), where we consider all speeches available in English and in the original language. Next, as done by Proksch and Slapin, we concatenated all speeches of all representatives of the same national party into a single party-level document for each language. Such dataset (see statistics in Table 1), which we share together with this paper, represents a new relevant resource for testing scaling algorithms in order to precisely examine their robustness across contexts and languages. However, it is also important to notice that the difference in size between the two legislation corpora may have an impact on the tested algorithms.

5 Results

Each unsupervised scaling technique assumes the existence of an underlying position/policy dimension across the documents under study. When processing transcripts of speeches from the European Parliament, Proksch and Slapin (2010) have shown to which extent the dimension that Wordfish determines corresponds better to the parties’ positions towards deeper EU integration than to their traditional left-to-right ideological positions. In this work, we extend such analysis in order to understand whether a) a scaling approach aware of semantics would better capture such dimension, b) EU integration positioning correlates in particular with a specific subset of linguistic traits, c) the left-to-right ideological dimension could also emerge, once we isolate certain textual features, and d) these findings hold across different languages.

As done by Proksch and Slapin (2010), we consider the positions of the parties under study derived from the Chapel Hill expert survey (years 2002 and 2006, respectively for the 5th and 6th legislation)181818 regarding the European integration process and a broad left-right ideology. The authors also conducted a more extensive analysis of the correlation of Wordfish, considering also National party positions and roll-call votes. We decided to focus only on EU integration and right-left ideology as the authors found that the scaling produced by Wordfish highly correlated with the first, but not with the second of these dimensions.

To study the extent of the correlation of the scaling with the known positions, we compute the pairwise accuracy (PA), i.e., the percentage of pairs with parties in the same order, as well as Pearson and Spearman () correlation (). While PA and Spearman correlation estimate the correctness of the ranking order, Pearson correlation also captures the extent to which the distances between party positions are reflected. In the tables presented we report the average of each measure across the five languages under study. This will highlight on how much the scaling correlates with known positions of parties; breakdown of the results for each language are available in the online appendix. Additionally, we present visual representations of the robustness of the inferred party positions across different languages.

5.1 European Integration Dimension

In Table 2, we present the averaged quality of the correlation between the produced scalings and the European integration positioning for the two legislations under study. The results re-confirm what already remarked by Proksch and Slapin (2010), namely that the scaling produced by Wordfish employing the entire text correlates better with positions of parties concerning European integration than Ideological Left-Right, (compare with Table 4). Moreover, they highlight how such effect is even more prominent when adopting a scaling algorithm aware of the semantic under study, such as SemScale. Such findings are further emphasized in Figures 1 and 2, which reveal the consistencies of SemScale across languages (further results are available in the online appendix).

5th Leg 6th Leg
Wordfish 0.55 0.17 0.13 0.55 0.19 0.13
(0.01) (0.04) (0.03) (0.03) (0.06) (0.08)
SemScale 0.62 0.43 0.33 0.59 0.30 0.29
(0.01) (0.01) (0.03) (0.02) (0.03) (0.07)
Table 2:

Averaged correlation of positioning using the entire text with European integration positions, across different languages. Standard errors are in brackets.

Figure 1: Correlation of Wordfish results using the entire text (5th legislation) with European Integration positioning.
Figure 2: Correlation of SemScale results using the entire text (5th legislation) with European Integration positioning.

By breaking down this analysis to its main linguistic components, we can notice how the Integration dimension emerges, as well when considering solely a sub-set of lexical features, for instance, when examining only nouns (see Table 3). Additionally, both the single use of nouns and verbs permit SemScale to produce scalings that are very robust across languages, as shown in Figure 3, where we also report the variation in position estimated for each party across the different languages.

5th Leg 6th Leg
Wordfish 0.55 0.17 0.13 0.55 0.21 0.11
(0.01) (0.05) (0.02) (0.02) (0.03) (0.07)
SemScale 0.56 0.29 0.19 0.58 0.29 0.23
(0.01) (0.02) (0.01) (0.01) (0.02) (0.03)
Table 3: Averaged correlation of positioning using only nouns with European integration positions, across different languages. Standard errors are in brackets.

Figure 3: Correlation of results with European Integration positioning across five languages (5th leg.), using only nouns (above) and only verbs (below).

To get a better understanding of such performance, we examined which specific verbs play a determinant role in the scaling generated by SemScale on documents from the 5th legislation; to do so, we extracted all verbs mentioned in the dataset, which are semantically close to one extreme and at the same time very distant to the other.191919

In terms of cosine similarity between the word embedding vector representation of the verb and of the documents on the extremes of the scale.

We noticed that verbs positioned on the two extremes of the scale seem to be capturing an underlying division between parties in power and in oppositions: on one hand, we find terms such as ”redistribute, refocus, rebuild, alleviate, investigate, participate” and on the other hand, terms like ”demand, invoke, criticize, support, wish, overlook”. While this is an initial attempt, we argue that having more control over the vocabulary under study could therefore support researchers through the interpretation of the results by obtaining a more fine-grained understanding of the scaling.

When moving to more ”semantic” features we noticed that the use with SemScale of proper nouns and even more specifically the occurrence of people seem to produce a scaling that highly correlates with the European integration dimension (in certain cases even better than when employing the entire text) and at the same time is robust across languages. By examining the names recognized we can extract interactions between political parties, where MEP from different groups could be praised or criticized during speeches, highlighting an additional political dimension from the speeches.

Figure 4: Correlation of results with European Integration positioning across five languages (5th leg.), using only mentioned people (above) and organizations (below).

On the other hand, while organizations highlight around which topics the parliament has focused its discussions (from ”railways” to ”Palestine” to ”Euratom”, up to ”Parmalat” and ”PKK”) the produced scalings are less consistent across different languages. This is due to different factors; by looking at the results we can notice for instance the current limitations of NER systems, which often miss-identify locations and organizations and produce very different results, depending on the language under study. Moreover, for what concerns SemScale, this could also be due to the training of word embeddings with too many dimensions over a corpus with not enough instances (e.g., mentions of organizations).

To conclude, as previously remarked by Proksch and Slapin (2010), we also recognize the presence of a strong underlying European integration dimension when examining transcript of speeches from the European Parliament. This is particularly well captured through the use of distributional semantics in SemScale. When moving to specific linguistic traits, we notice how these are able to capture fine-grained dimensions, such as power vs opposition, as well as, interactions between different political groups, while in certain cases producing also a scaling which is very consistent across the five languages under study.

5th Leg 6th Leg
Wordfish 0.54 0.03 0.11 0.53 0.08 0.08
(0.01) (0.01) (0.01) (0.01) (0.02) (0.01)
SemScale 0.56 0.19 0.19 0.54 0.06 0.09
(0.01) (0.02) (0.03) (0.01) (0.01) (0.03)
Table 4: Averaged correlation of positioning using the entire text, with left-right positions across different languages. Standard errors are in brackets.

5.2 Left-Right Dimension

When moving to the analysis of the correlation with Left-Right ideological positions, as already described in Proksch and Slapin (2010), this is also less captured by the tested text scaling methods, as can be seen in Table 4. The same findings emerge even more clearly when studying the role of lexical features, such as nouns, verbs, and adjectives. Interesting exceptions are the use of proper nouns which, by capturing in particular the name of countries and cities, present a dimension, which sometimes correlates better than the entire text with the ideological dimension (results available in the appendix). Along the same line, it is very interesting that the use of knowledge base entities presents a scaling which appears both in line with the ideological dimension and, additionally, it is also often consistent across languages (see Table 5 and Figure 5). By looking at the ways entities are mentioned on the two sides of the spectrum we notice references to previous events, such as the Siege of Sarajevo, or to international projects, such as the Life Long Leonardo da Vinci Program, up to discussions surrounding the use of Embryonic stem cells or bio-fuel.

5th Leg 6th Leg
Wordfish 0.56 0.12 0.18 0.57 0.20 0.19
(0.01) (0.01) (0.01) (0.01) (0.04) (0.04)
SemScale 0.60 0.16 0.29 0.57 0.14 0.19
(0.01) (0.02) (0.02) (0.01) (0.04) (0.01)
Table 5: Averaged correlation of positioning using only knowledge-base entities, with left-right positions across different languages. Standard errors are in brackets.

Figure 5: Correlation of results with Left-Right positioning across five languages (5th leg. above and 6th below), using only mentioned entities.

The possibility of handling mentions of political events, projects and topics might be key for uncovering previously unexplored dimensions in text scaling which, especially in the context of European Parliamentary speeches, have the potential of revealing the existence of an ideological vocabulary which instead might get lost when treating texts as bag of words. However, when examining the (possible future) advantages of employing disambiguated named entities, it is also important to remember that entity linking tools such as the one employed in this study are in their very early days, which means that they are often prone to mistakes. For instance, mentions of the Leonardo Program are systematically linked to the entry of the artist instead of dbr:Leonardo_da_Vinci_programme. Additionally, the continuous changes and extensions of the reference knowledge base (in this case Wikipedia/DBpedia in five different languages) make the reproducibility of such findings quite complex.

Nevertheless, we are very positively impressed by the robustness of the scaling produced by using knowledge base entities, which proves to be comparable when not better to the one produced using the entire text. We are hopeful that this will motivate others in investigating the role that entities such as events, locations, treaties, and agreements play in determining ideological positions from textual data.

6 Conclusion

Years of research in text scaling have highlighted the fact that bag of words representations of documents, such as the ones employed by Wordfish, have the ability of capturing an underlying dimension across the collection under study, which often correlates with ideological positioning or attitudes towards a relevant topic, for instance, the European integration process. However, while such a scaling approach has supported a large number of different studies, it is inherently limited by the fact that it works at word-frequency level and does not consider any semantic representation of the text. In contrast to this, in this work we present SemScale, a new semantically-aware scaling method that exploits distributional semantic representations of the texts under study. We have provided empirical evidence of how by employing semantic information the algorithms are able to better capture the European integration dimension, underlying speeches from the European parliament. Moreover, we have shown how having more control on the lexical and semantic information that a scaling algorithm adopts could make its output more robust, while at the same time simplifying the interpretation by reducing the vocabulary under study (for instance when considering only nouns or verbs instead of all tokens) and by attempting to removing ambiguity (with disambiguated entities).

Nevertheless, while the results presented in this paper are promising, it is essential to approach such findings only as the first steps in a new direction for text scaling, keeping in mind, also, the already remarked current limitations of such features and configurations. For these reasons, we release together with this paper the entire evaluation setting employed in our work, an online demo that provides the possibility of directly testing our scaling algorithm through an easy-to-use graphic interface and a Python implementation of SemScale (usable as a command-line tool), for encouraging further research and improvements from the community.


  • Auer et al. (2007) Auer, S., C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives (2007). Dbpedia: A nucleus for a web of open data. Lecture Notes in Computer Science 4825, 722–735.
  • Bojanowski et al. (2017) Bojanowski, P., E. Grave, A. Joulin, and T. Mikolov (2017). Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 5, 135–146.
  • Budge and Pennings (2007a) Budge, I. and P. Pennings (2007a). Do They Work? Validating Computerised Word Frequency Estimates Against Policy Series. Electoral Studies 26(1), 121–129.
  • Budge and Pennings (2007b) Budge, I. and P. Pennings (2007b). Missing the Message and Shooting the Messenger: Benoit and Laver’s ’Response’. Electoral Studies 26(1), 136–141.
  • Denny and Spirling (2018) Denny, M. J. and A. Spirling (2018).

    Text preprocessing for unsupervised learning: Why it matters, when it misleads, and what to do about it.

    Political Analysis 26(2), 168–189.
  • Firth (1957) Firth, J. (1957). A Synopsis of Linguistic Theory 1930-1955. In Studies in Linguistic Analysis. Philological Society, Oxford.
  • Frege (1953) Frege, G. (1953). The Foundations of Arithmetic a Logico-Mathematical Enquiry Into the Concept of Number. Oxford Basil Blackwell.
  • Ganea and Hofmann (2017) Ganea, O.-E. and T. Hofmann (2017). Deep joint entity disambiguation with local neural attention (emnlp 2017). In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2619–2629.
  • Glavaš et al. (2017) Glavaš, G., F. Nanni, and S. P. Ponzetto (2017). Unsupervised Cross-Lingual Scaling of Political Texts. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pp. 688–693.
  • Glavaš and Ponzetto (2017) Glavaš, G. and S. P. Ponzetto (2017).

    Dual tensor model for detecting asymmetric lexico-semantic relations.

    In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1757–1767.
  • Glavaš and Vulić (2018) Glavaš, G. and I. Vulić (2018). Discriminating between lexico-semantic relations with the specialization tensor model. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics), pp. 181–187.
  • Goodfellow et al. (2016) Goodfellow, I., Y. Bengio, and A. Courville (2016). Deep learning. MIT Press, Cambridge.
  • Greene et al. (2016) Greene, Z., A. Ceron, G. Schumacher, and Z. Fazekas (2016). The Nuts and Bolts of Automated Text Analysis. Comparing Different Document Pre-Processing Techniques in Four Countries. Working paper.
  • Gurciullo and Mikhaylov (2017) Gurciullo, S. and S. J. Mikhaylov (2017). Detecting Policy Preferences and Dynamics in the UN General Debate with Neural Word Embeddings. Working Paper.
  • Joulin et al. (2017) Joulin, A., E. Grave, P. Bojanowski, M. Nickel, and T. Mikolov (2017).

    Fast linear model for knowledge graph embeddings.

    In Proceedings of the 6th Workshop on Automated Knowledge Base Construction (AKBC).
  • Lafferty et al. (2001) Lafferty, J. D., A. McCallum, and F. C. N. Pereira (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289.
  • Laver et al. (2003) Laver, M., K. Benoit, and J. Garry (2003). Extracting policy positions from political texts using words as data. American Political Science Review 97(02), 311–331.
  • Lowe et al. (2011) Lowe, W., K. Benoit, S. Mikhaylov, and M. Laver (2011, 2). Scaling Policy Preferences from Coded Political Texts. Legislative Studies Quarterly 36(1), 123–155.
  • Manning et al. (2014) Manning, C., M. Surdeanu, J. Bauer, J. Finkel, S. Bethard, and D. McClosky (2014, June). The stanford corenlp natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics, pp. 55–60.
  • Mendes et al. (2011) Mendes, P. N., M. Jakob, A. García-Silva, and C. Bizer (2011). DBpedia spotlight. In Proceedings of the 7th International Conference on Semantic Systems - I-Semantics ’11.
  • Mikolov et al. (2013) Mikolov, T., I. Sutskever, K. Chen, G. S. Corrado, and J. Dean (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pp. 3111–3119.
  • Nanni et al. (2019) Nanni, F., G. Glavaš, S. P. Ponzetto, and H. Stuckenschmidt (2019). Online Appendix: Political Text Scaling Meets Computational Semantics. Online Appendix.
  • Navigli and Ponzetto (2012) Navigli, R. and S. P. Ponzetto (2012). Babelnet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence 193, 217–250.
  • Nickel et al. (2016) Nickel, M., K. Murphy, V. Tresp, and E. Gabrilovich (2016). A review of relational machine learning for knowledge graphs. Proceedings of the IEEE 104(1), 11–33.
  • Pennington et al. (2014) Pennington, J., R. Socher, and C. Manning (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing, pp. 1532–1543.
  • Porter (1980) Porter, M. (1980, 3). An algorithm for suffix stripping. Program 14(3), 130–137.
  • Proksch and Slapin (2010) Proksch, S.-O. and J. B. Slapin (2010). Position Taking in European Parliament Speeches. British Journal of Political Science 52, 587–611.
  • Shen et al. (2018) Shen, D., G. Wang, W. Wang, M. R. Min, Q. Su, Y. Zhang, C. Li, R. Henao, and L. Carin (2018). Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms. pp. 440–450.
  • Slapin and Proksch (2008) Slapin, J. B. and S.-O. Proksch (2008). A scaling model for estimating time-series party positions from texts. American Journal of Political Science 52, 705–722.
  • Spirling and Rodriguez (2019) Spirling, A. and P. Rodriguez (2019). Word Embeddings: What works, what doesn’t, and how to tell the difference for applied research. Working paper.
  • Zhu et al. (2003) Zhu, X., Z. Ghahramani, and J. D. Lafferty (2003). Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International conference on Machine learning, pp. 912–919.