Classifying multilingual party manifestos: Domain transfer across country, time, and genre

07/31/2023
by   Matthias Aßenmacher, et al.
0

Annotating costs of large corpora are still one of the main bottlenecks in empirical social science research. On the one hand, making use of the capabilities of domain transfer allows re-using annotated data sets and trained models. On the other hand, it is not clear how well domain transfer works and how reliable the results are for transfer across different dimensions. We explore the potential of domain transfer across geographical locations, languages, time, and genre in a large-scale database of political manifestos. First, we show the strong within-domain classification performance of fine-tuned transformer models. Second, we vary the genre of the test set across the aforementioned dimensions to test for the fine-tuned models' robustness and transferability. For switching genres, we use an external corpus of transcribed speeches from New Zealand politicians while for the other three dimensions, custom splits of the Manifesto database are used. While BERT achieves the best scores in the initial experiments across modalities, DistilBERT proves to be competitive at a lower computational expense and is thus used for further experiments across time and country. The results of the additional analysis show that (Distil)BERT can be applied to future data with similar performance. Moreover, we observe (partly) notable differences between the political manifestos of different countries of origin, even if these countries share a language or a cultural background.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/15/2019

Multilingual is not enough: BERT for Finnish

Deep learning-based language models pretrained on large unannotated text...
research
05/06/2021

Adapting Monolingual Models: Data can be Scarce when Language Similarity is High

For many (minority) languages, the resources needed to train large model...
research
03/07/2023

ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification

ChatGPT has shown strong capabilities in natural language generation tas...
research
09/29/2020

Gender prediction using limited Twitter Data

Transformer models have shown impressive performance on a variety of NLP...
research
08/15/2021

Maps Search Misspelling Detection Leveraging Domain-Augmented Contextual Representations

Building an independent misspelling detector and serve it before correct...
research
05/26/2023

Free Lunch: Robust Cross-Lingual Transfer via Model Checkpoint Averaging

Massively multilingual language models have displayed strong performance...
research
05/23/2023

Is a Prestigious Job the same as a Prestigious Country? A Case Study on Multilingual Sentence Embeddings and European Countries

We study how multilingual sentence representations capture European coun...

Please sign up or login with your details

Forgot password? Click here to reset