Comparing Approaches to Dravidian Language Identification

03/09/2021
by   Tommi Jauhiainen, et al.
0

This paper describes the submissions by team HWR to the Dravidian Language Identification (DLI) shared task organized at VarDial 2021 workshop. The DLI training set includes 16,674 YouTube comments written in Roman script containing code-mixed text with English and one of the three South Dravidian languages: Kannada, Malayalam, and Tamil. We submitted results generated using two models, a Naive Bayes classifier with adaptive language models, which has shown to obtain competitive performance in many language and dialect identification tasks, and a transformer-based model which is widely regarded as the state-of-the-art in a number of NLP tasks. Our first submission was sent in the closed submission track using only the training set provided by the shared task organisers, whereas the second submission is considered to be open as it used a pretrained model trained with external data. Our team attained shared second position in the shared task with the submission based on Naive Bayes. Our results reinforce the idea that deep learning methods are not as competitive in language identification related tasks as they are in many other text classification tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/14/2021

indicnlp@kgp at DravidianLangTech-EACL2021: Offensive Language Identification in Dravidian Languages

The paper presents the submission of the team indicnlp@kgp to the EACL 2...
research
09/22/2020

Investigating Machine Learning Methods for Language and Dialect Identification of Cuneiform Texts

Identification of the languages written using cuneiform symbols is a dif...
research
11/01/2017

Improved Text Language Identification for the South African Languages

Virtual assistants and text chatbots have recently been gaining populari...
research
11/18/2019

Short Text Language Identification for Under Resourced Languages

The paper presents a hierarchical naive Bayesian and lexicon based class...
research
03/26/2019

Language Model Adaptation for Language and Dialect Identification of Text

This article describes an unsupervised language model adaptation approac...
research
01/29/2021

NLPBK at VLSP-2020 shared task: Compose transformer pretrained models for Reliable Intelligence Identification on Social network

This paper describes our method for tuning a transformer-based pretraine...
research
05/20/2022

Modernizing Open-Set Speech Language Identification

While most modern speech Language Identification methods are closed-set,...

Please sign up or login with your details

Forgot password? Click here to reset