Czech Text Document Corpus v 2.0

10/06/2017
by   Pavel Král, et al.
0

This paper introduces "Czech Text Document Corpus v 2.0", a collection of text documents for automatic document classification in Czech language. It is composed of 11,955 text documents provided by the Czech News Agency and is freely available for research purposes at http://home.zcu.cz/ pkral/sw.This corpus was created in order to facilitate a straightforward comparison of the document classification approaches on Czech data. It is particularly dedicated for evaluation of multi-label document classification approaches, because one document is usually labelled with more than one label. Besides the information about the document classes, the corpus is annotated at morphological layer. This paper further shows the results of selected state-of-the-art methods on this corpus to offer the possibility of an easy comparison with these approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/13/2017

Deep Neural Networks for Czech Multi-label Document Classification

This paper is focused on automatic multi-label document classification o...
research
09/23/2022

Cem Mil Podcasts: A Spoken Portuguese Document Corpus

This document describes the Portuguese language podcast dataset released...
research
04/12/2016

Efficient Classification of Multi-Labelled Text Streams by Clashing

We present a method for the classification of multi-labelled text docume...
research
07/15/2020

Evaluation of Neural Network Classification Systems on Document Stream

One major drawback of state of the art Neural Networks (NN)-based approa...
research
08/27/2022

Quantifying French Document Complexity

Measuring a document's complexity level is an open challenge, particular...
research
01/29/2020

Comparison of scanned administrative document images

In this work the methods of comparison of digitized copies of administra...
research
03/16/2017

Improving Document Clustering by Eliminating Unnatural Language

Technical documents contain a fair amount of unnatural language, such as...

Please sign up or login with your details

Forgot password? Click here to reset