Cross-Lingual Text Classification with Multilingual Distillation and Zero-Shot-Aware Training

02/28/2022
by   Ziqing Yang, et al.
0

Multilingual pre-trained language models (MPLMs) not only can handle tasks in different languages but also exhibit surprising zero-shot cross-lingual transferability. However, MPLMs usually are not able to achieve comparable supervised performance on rich-resource languages compared to the state-of-the-art monolingual pre-trained models. In this paper, we aim to improve the multilingual model's supervised and zero-shot performance simultaneously only with the resources from supervised languages. Our approach is based on transferring knowledge from high-performance monolingual models with a teacher-student framework. We let the multilingual model learn from multiple monolingual models simultaneously. To exploit the model's cross-lingual transferability, we propose MBLM (multi-branch multilingual language model), a model built on the MPLMs with multiple language branches. Each branch is a stack of transformers. MBLM is trained with the zero-shot-aware training strategy that encourages the model to learn from the mixture of zero-shot representations from all the branches. The results on two cross-lingual classification tasks show that, with only the task's supervised data used, our method improves both the supervised and zero-shot performance of MPLMs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/10/2019

Can Monolingual Pretrained Models Help Cross-Lingual Classification?

Multilingual pretrained language models (such as multilingual BERT) have...
research
10/15/2021

A Multilingual Bag-of-Entities Model for Zero-Shot Cross-Lingual Text Classification

We present a multilingual bag-of-entities model that effectively boosts ...
research
02/23/2022

Zero-shot Cross-lingual Transfer of Prompt-based Tuning with a Unified Multilingual Prompt

Prompt-based tuning has been proven effective for pretrained language mo...
research
08/24/2021

Are the Multilingual Models Better? Improving Czech Sentiment with Transformers

In this paper, we aim at improving Czech sentiment with transformer-base...
research
10/25/2022

Multilingual Relation Classification via Efficient and Effective Prompting

Prompting pre-trained language models has achieved impressive performanc...
research
01/30/2023

Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining

While neural text-to-speech (TTS) has achieved human-like natural synthe...
research
06/09/2021

Probing Multilingual Language Models for Discourse

Pre-trained multilingual language models have become an important buildi...

Please sign up or login with your details

Forgot password? Click here to reset