AI Chat AI Image Generator AI Video Text to Speech

You Can Have Your Data and Balance It Too: Towards Balanced and Efficient Multilingual Models

10/13/2022

∙

by Tomasz Limisiewicz, et al.

∙

∙

Multilingual models have been widely used for cross-lingual transfer to low-resource languages. However, the performance on these languages is hindered by their underrepresentation in the pretraining data. To alleviate this problem, we propose a novel multilingual training technique based on teacher-student knowledge distillation. In this setting, we utilize monolingual teacher models optimized for their language. We use those teachers along with balanced (sub-sampled) data to distill the teachers' knowledge into a single multilingual student. Our method outperforms standard training methods in low-resource languages and retrains performance on high-resource languages while using the same amount of data. If applied widely, our approach can increase the representation of low-resource languages in NLP systems.

Tomasz Limisiewicz
9 publications
Dan Malkin
2 publications
Gabriel Stanovsky
36 publications

research

∙ 05/25/2023

Cross-Lingual Knowledge Distillation for Answer Sentence Selection in Low-Resource Languages

While impressive performance has been achieved on the task of Answer Sen...

0 Shivanshu Gupta, et al. ∙

research

∙ 05/25/2022

Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages

Scaling multilingual representation learning beyond the hundred most fre...

0 Kevin Heffernan, et al. ∙

research

∙ 04/08/2020

Structure-Level Knowledge Distillation For Multilingual Sequence Labeling

Multilingual sequence labeling is a task of predicting label sequences u...

8 Xinyu Wang, et al. ∙

research

∙ 08/12/2018

Addressee and Response Selection for Multilingual Conversation

Developing conversational systems that can converse in many languages is...

0 Motoki Sato, et al. ∙

research

∙ 11/02/2022

Multi-level Distillation of Semantic Knowledge for Pre-training Multilingual Language Model

Pre-trained multilingual language models play an important role in cross...

0 Mingqi Li, et al. ∙

research

∙ 10/27/2022

Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models

Leveraging shared learning through Massively Multilingual Models, state-...

0 Harshita Diddee, et al. ∙

research

∙ 06/20/2020

SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection

A broad goal in natural language processing (NLP) is to develop a system...

0 Ekaterina Vylomova, et al. ∙