Multilingual Text Classification for Dravidian Languages

12/03/2021
by   Xiaotian Lin, et al.
0

As the fourth largest language family in the world, the Dravidian languages have become a research hotspot in natural language processing (NLP). Although the Dravidian languages contain a large number of languages, there are relatively few public available resources. Besides, text classification task, as a basic task of natural language processing, how to combine it to multiple languages in the Dravidian languages, is still a major difficulty in Dravidian Natural Language Processing. Hence, to address these problems, we proposed a multilingual text classification framework for the Dravidian languages. On the one hand, the framework used the LaBSE pre-trained model as the base model. Aiming at the problem of text information bias in multi-task learning, we propose to use the MLM strategy to select language-specific words, and used adversarial training to perturb them. On the other hand, in view of the problem that the model cannot well recognize and utilize the correlation among languages, we further proposed a language-specific representation module to enrich semantic information for the model. The experimental results demonstrated that the framework we proposed has a significant performance in multilingual text classification tasks with each strategy achieving certain improvements.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/04/2022

MiLMo:Minority Multilingual Pre-trained Language Model

Pre-trained language models are trained on large-scale unsupervised data...
research
03/28/2023

Model and Evaluation: Towards Fairness in Multilingual Text Classification

Recently, more and more research has focused on addressing bias in text ...
research
11/29/2019

A Multi-cascaded Deep Model for Bilingual SMS Classification

Most studies on text classification are focused on the English language....
research
07/26/2019

Weakly Supervised Domain Detection

In this paper we introduce domain detection as a new natural language pr...
research
08/03/2023

Tag Prediction of Competitive Programming Problems using Deep Learning Techniques

In the past decade, the amount of research being done in the fields of m...
research
07/18/2022

Deep Sequence Models for Text Classification Tasks

The exponential growth of data generated on the Internet in the current ...
research
05/15/2023

Taxi1500: A Multilingual Dataset for Text Classification in 1500 Languages

While natural language processing tools have been developed extensively ...

Please sign up or login with your details

Forgot password? Click here to reset