A German Corpus for Text Similarity Detection Tasks

Text similarity detection aims at measuring the degree of similarity between a pair of texts. Corpora available for text similarity detection are designed to evaluate the algorithms to assess the paraphrase level among documents. In this paper we present a textual German corpus for similarity detection. The purpose of this corpus is to automatically assess the similarity between a pair of texts and to evaluate different similarity measures, both for whole documents or for individual sentences. Therefore we have calculated several simple measures on our corpus based on a library of similarity functions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2019

A Corpus for Automatic Readability Assessment and Text Simplification of German

In this paper, we present a corpus for use in automatic readability asse...
research
12/20/2019

What do Asian Religions Have in Common? An Unsupervised Text Analytics Exploration

The main source of various religious teachings is their sacred texts whi...
research
09/26/2021

Electoral Programs of German Parties 2021: A Computational Analysis Of Their Comprehensibility and Likeability Based On SentiArt

The electoral programs of six German parties issued before the parliamen...
research
09/12/2019

A Deep Learning-Based Approach for Measuring the Domain Similarity of Persian Texts

In this paper, we propose a novel approach for measuring the degree of s...
research
07/09/2018

Detecting Levels of Depression in Text Based on Metrics

Depression is one of the most common and a major concern for society. Pr...
research
08/27/2022

Quantifying French Document Complexity

Measuring a document's complexity level is an open challenge, particular...
research
03/25/2022

Plagiarism Detection in the Bengali Language: A Text Similarity-Based Approach

Plagiarism means taking another person's work and not giving any credit ...

Please sign up or login with your details

Forgot password? Click here to reset