A Corpus for Automatic Readability Assessment and Text Simplification of German

09/19/2019
by   Alessia Battisti, et al.
0

In this paper, we present a corpus for use in automatic readability assessment and automatic text simplification of German. The corpus is compiled from web sources and consists of approximately 211,000 sentences. As a novel contribution, it contains information on text structure, typography, and images, which can be exploited as part of machine learning approaches to readability assessment and text simplification. The focus of this publication is on representing such information as an extension to an existing corpus standard.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2020

Swiss Parliaments Corpus, an Automatically Aligned Swiss German Speech to Standard German Text Corpus

We present a forced sentence alignment procedure for Swiss German speech...
research
04/16/2019

Subjective Assessment of Text Complexity: A Dataset for German Language

This paper presents TextComplexityDE, a dataset consisting of 1000 sente...
research
09/09/2022

Automatic Readability Assessment of German Sentences with Transformer Ensembles

Reliable methods for automatic readability assessment have the potential...
research
03/11/2017

A German Corpus for Text Similarity Detection Tasks

Text similarity detection aims at measuring the degree of similarity bet...
research
01/06/2020

Identifying Historical Travelogues in Large Text Corpora Using Machine Learning

Travelogues represent an important and intensively studied source for sc...
research
06/16/2023

Cross-corpus Readability Compatibility Assessment for English Texts

Text readability assessment has gained significant attention from resear...
research
01/30/2017

Structural Analysis of Hindi Phonetics and A Method for Extraction of Phonetically Rich Sentences from a Very Large Hindi Text Corpus

Automatic speech recognition (ASR) and Text to speech (TTS) are two prom...

Please sign up or login with your details

Forgot password? Click here to reset