Language and Dialect Identification of Cuneiform Texts

03/05/2019
by   Tommi Jauhiainen, et al.
0

This article introduces a corpus of cuneiform texts from which the dataset for the use of the Cuneiform Language Identification (CLI) 2019 shared task was derived as well as some preliminary language identification experiments conducted using that corpus. We also describe the CLI dataset and how it was derived from the corpus. In addition, we provide some baseline language identification results using the CLI dataset. To the best of our knowledge, the experiments detailed here are the first time automatic language identification methods have been used on cuneiform data.

READ FULL TEXT
research
08/27/2020

Uralic Language Identification (ULI) 2020 shared task dataset and the Wanca 2017 corpus

This article introduces the Wanca 2017 corpus of texts crawled from the ...
research
02/24/2021

Automatic Meter Classification of Kurdish Poems

Most of the classic texts in Kurdish literature are poems. Knowing the m...
research
10/09/2019

Towards De-identification of Legal Texts

In many countries, personal information that can be published or shared ...
research
03/26/2019

Language Model Adaptation for Language and Dialect Identification of Text

This article describes an unsupervised language model adaptation approac...
research
07/10/2016

A New Bengali Readability Score

In this paper we have proposed methods to analyze the readability of Ben...
research
07/01/2020

So What's the Plan? Mining Strategic Planning Document

In this paper we present a corpus of Russian strategic planning document...
research
07/01/2020

So What's the Plan? Mining Strategic Planning Documents

In this paper we present a corpus of Russian strategic planning document...

Please sign up or login with your details

Forgot password? Click here to reset