Tracking, exploring and analyzing recent developments in German-language online press in the face of the coronavirus crisis: cOWIDplus Analysis and cOWIDplus Viewer

by   Sascha Wolfer, et al.

The coronavirus pandemic may be the largest crisis the world has had to face since World War II. It does not come as a surprise that it is also having an impact on language as our primary communication tool. We present three inter-connected resources that are designed to capture and illustrate these effects on a subset of the German language: An RSS corpus of German-language newsfeeds (with freely available untruncated unigram frequency lists), a static but continuously updated HTML page tracking the diversity of the used vocabulary and a web application that enables other researchers and the broader public to explore these effects without any or with little knowledge of corpus representation/exploration or statistical analyses.


page 1

page 2

page 3

page 4


Automatic Creation of Text Corpora for Low-Resource Languages from the Internet: The Case of Swiss German

This paper presents SwissCrawl, the largest Swiss German text corpus to ...

GGPONC: A Corpus of German Medical Text with Rich Metadata Based on Clinical Practice Guidelines

The lack of publicly available text corpora is a major obstacle for prog...

SwissDial: Parallel Multidialectal Corpus of Spoken Swiss German

Swiss German is a dialect continuum whose natively acquired dialects sig...

Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts

We introduce the Merkel Podcast Corpus, an audio-visual-text corpus in G...

TuGeBiC: A Turkish German Bilingual Code-Switching Corpus

In this paper we describe the process of collection, transcription, and ...

Distribution-based Prediction of the Degree of Grammaticalization for German Prepositions

We test the hypothesis that the degree of grammaticalization of German p...

Effects of Layer Freezing when Transferring DeepSpeech to New Languages

In this paper, we train Mozilla's DeepSpeech architecture on German and ...