Language Model Adaptation for Language and Dialect Identification of Text

03/26/2019
by   Tommi Jauhiainen, et al.
0

This article describes an unsupervised language model adaptation approach that can be used to enhance the performance of language identification methods. The approach is applied to a current version of the HeLI language identification method, which is now called HeLI 2.0. We describe the HeLI 2.0 method in detail. The resulting system is evaluated using the datasets from the German dialect identification and Indo-Aryan language identification shared tasks of the VarDial workshops 2017 and 2018. The new approach with language identification provides considerably higher F1-scores than the previous HeLI method or the other systems which participated in the shared tasks. The results indicate that unsupervised language model adaptation should be considered as an option in all language identification tasks, especially in those where encountering out-of-domain data is likely.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/27/2021

Factorized Neural Transducer for Efficient Language Model Adaptation

In recent years, end-to-end (E2E) based automatic speech recognition (AS...
research
03/05/2019

Language and Dialect Identification of Cuneiform Texts

This article introduces a corpus of cuneiform texts from which the datas...
research
05/11/2023

Musketeer (All for One, and One for All): A Generalist Vision-Language Model with Task Explanation Prompts

We present a sequence-to-sequence vision-language model whose parameters...
research
03/09/2021

Comparing Approaches to Dravidian Language Identification

This paper describes the submissions by team HWR to the Dravidian Langua...
research
07/16/2017

Open-Set Language Identification

We present the first open-set language identification experiments using ...
research
07/28/2020

GUIR at SemEval-2020 Task 12: Domain-Tuned Contextualized Models for Offensive Language Detection

Offensive language detection is an important and challenging task in nat...
research
01/13/2017

LIDE: Language Identification from Text Documents

The increase in the use of microblogging came along with the rapid growt...

Please sign up or login with your details

Forgot password? Click here to reset