Cross-Lingual Text Classification of Transliterated Hindi and Malayalam

08/31/2021
by   Jitin Krishnan, et al.
0

Transliteration is very common on social media, but transliterated text is not adequately handled by modern neural models for various NLP tasks. In this work, we combine data augmentation approaches with a Teacher-Student training scheme to address this issue in a cross-lingual transfer setting for fine-tuning state-of-the-art pre-trained multilingual language models such as mBERT and XLM-R. We evaluate our method on transliterated Hindi and Malayalam, also introducing new datasets for benchmarking on real-world scenarios: one on sentiment classification in transliterated Malayalam, and another on crisis tweet classification in transliterated Hindi and Malayalam (related to the 2013 North India and 2018 Kerala floods). Our method yielded an average improvement of +5.6

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2021

Consistency Regularization for Cross-Lingual Fine-Tuning

Fine-tuning pre-trained cross-lingual language models can transfer task-...
research
03/05/2023

WADER at SemEval-2023 Task 9: A Weak-labelling framework for Data augmentation in tExt Regression Tasks

Intimacy is an essential element of human relationships and language is ...
research
06/08/2023

T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text Classification

Cross-lingual text classification leverages text classifiers trained in ...
research
06/05/2022

Exploring Cross-lingual Textual Style Transfer with Large Multilingual Language Models

Detoxification is a task of generating text in polite style while preser...
research
09/06/2021

Nearest Neighbour Few-Shot Learning for Cross-lingual Classification

Even though large pre-trained multilingual models (e.g. mBERT, XLM-R) ha...
research
08/04/2020

NLPDove at SemEval-2020 Task 12: Improving Offensive Language Detection with Cross-lingual Transfer

This paper describes our approach to the task of identifying offensive l...
research
10/11/2022

From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language Models

Investigating better ways to reuse the released pre-trained language mod...

Please sign up or login with your details

Forgot password? Click here to reset