German Parliamentary Corpus (GerParCor)

04/21/2022
by   Giuseppe Abrami, et al.
0

Parliamentary debates represent a large and partly unexploited treasure trove of publicly accessible texts. In the German-speaking area, there is a certain deficit of uniformly accessible and annotated corpora covering all German-speaking parliaments at the national and federal level. To address this gap, we introduce the German Parliament Corpus (GerParCor). GerParCor is a genre-specific corpus of (predominantly historical) German-language parliamentary protocols from three centuries and four countries, including state and federal level data. In addition, GerParCor contains conversions of scanned protocols and, in particular, of protocols in Fraktur converted via an OCR process based on Tesseract. All protocols were preprocessed by means of the NLP pipeline of spaCy3 and automatically annotated with metadata regarding their session date. GerParCor is made available in the XMI format of the UIMA project. In this way, GerParCor can be used as a large corpus of historical texts in the field of political communication for various tasks in NLP.

READ FULL TEXT
research
07/13/2020

GGPONC: A Corpus of German Medical Text with Rich Metadata Based on Clinical Practice Guidelines

The lack of publicly available text corpora is a major obstacle for prog...
research
03/21/2021

SwissDial: Parallel Multidialectal Corpus of Spoken Swiss German

Swiss German is a dialect continuum whose natively acquired dialects sig...
research
05/01/2018

An Annotated Corpus for Machine Reading of Instructions in Wet Lab Protocols

We describe an effort to annotate a corpus of natural language instructi...
research
01/06/2020

Identifying Historical Travelogues in Large Text Corpora Using Machine Learning

Travelogues represent an important and intensively studied source for sc...
research
04/19/2022

I still have Time(s): Extending HeidelTime for German Texts

HeidelTime is one of the most widespread and successful tools for detect...
research
05/27/2022

Who is we? Disambiguating the referents of first person plural pronouns in parliamentary debates

This paper investigates the use of first person plural pronouns as a rhe...
research
08/19/2022

Pseudo-Labels Are All You Need

Automatically estimating the complexity of texts for readers has a varie...

Please sign up or login with your details

Forgot password? Click here to reset