T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text Classification

06/08/2023
by   Inigo Jauregi Unanue, et al.
0

Cross-lingual text classification leverages text classifiers trained in a high-resource language to perform text classification in other languages with no or minimal fine-tuning (zero/few-shots cross-lingual transfer). Nowadays, cross-lingual text classifiers are typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest. However, the performance of these models vary significantly across languages and classification tasks, suggesting that the superposition of the language modelling and classification tasks is not always effective. For this reason, in this paper we propose revisiting the classic "translate-and-test" pipeline to neatly separate the translation and classification stages. The proposed approach couples 1) a neural machine translator translating from the targeted language to a high-resource language, with 2) a text classifier trained in the high-resource language, but the neural machine translator generates "soft" translations to permit end-to-end backpropagation during fine-tuning of the pipeline. Extensive experiments have been carried out over three cross-lingual text classification datasets (XNLI, MLDoc and MultiEURLEX), with the results showing that the proposed approach has significantly improved performance over a competitive baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2021

Cross-lingual Text Classification with Heterogeneous Graph Neural Network

Cross-lingual text classification aims at training a classifier on the s...
research
10/19/2018

Revisiting Distributional Correspondence Indexing: A Python Reimplementation and New Experiments

This paper introduces PyDCI, a new implementation of Distributional Corr...
research
05/05/2017

Cross-lingual Distillation for Text Classification

Cross-lingual text classification(CLTC) is the task of classifying docum...
research
11/29/2022

Compressing Cross-Lingual Multi-Task Models at Qualtrics

Experience management is an emerging business area where organizations f...
research
06/15/2021

Consistency Regularization for Cross-Lingual Fine-Tuning

Fine-tuning pre-trained cross-lingual language models can transfer task-...
research
08/31/2021

Cross-Lingual Text Classification of Transliterated Hindi and Malayalam

Transliteration is very common on social media, but transliterated text ...
research
04/19/2022

Detecting Text Formality: A Study of Text Classification Approaches

Formality is an important characteristic of text documents. The automati...

Please sign up or login with your details

Forgot password? Click here to reset