Distilling Efficient Language-Specific Models for Cross-Lingual Transfer

06/02/2023
by   Alan Ansell, et al.
0

Massively multilingual Transformers (MMTs), such as mBERT and XLM-R, are widely used for cross-lingual transfer learning. While these are pretrained to represent hundreds of languages, end users of NLP systems are often interested only in individual languages. For such purposes, the MMTs' language coverage makes them unnecessarily expensive to deploy in terms of model size, inference time, energy, and hardware cost. We thus propose to extract compressed, language-specific models from MMTs which retain the capacity of the original MMTs for cross-lingual transfer. This is achieved by distilling the MMT bilingually, i.e., using data from only the source and target language of interest. Specifically, we use a two-phase distillation approach, termed BiStil: (i) the first phase distils a general bilingual model from the MMT, while (ii) the second, task-specific phase sparsely fine-tunes the bilingual "student" model using a task-tuned variant of the original MMT as its "teacher". We evaluate this distillation technique in zero-shot cross-lingual transfer across a number of standard cross-lingual benchmarks. The key results indicate that the distilled models exhibit minimal degradation in target language performance relative to the base MMT despite being significantly smaller and faster. Furthermore, we find that they outperform multilingually distilled models such as DistilmBERT and MiniLMv2 while having a very modest training budget in comparison, even on a per-language basis. We also show that bilingual models distilled from MMTs greatly outperform bilingual models trained from scratch. Our code and models are available at https://github.com/AlanAnsell/bistil.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/11/2020

Orthogonal Language and Task Adapters in Zero-Shot Cross-Lingual Transfer

Adapter modules, additional trainable parameters that enable efficient f...
research
05/14/2021

A cost-benefit analysis of cross-lingual transfer methods

An effective method for cross-lingual transfer is to fine-tune a bilingu...
research
09/07/2022

Improving the Cross-Lingual Generalisation in Visual Question Answering

While several benefits were realized for multilingual vision-language pr...
research
06/05/2023

Cross-Lingual Transfer with Target Language-Ready Task Adapters

Adapters have emerged as a modular and parameter-efficient approach to (...
research
03/24/2022

Revisiting the Effects of Leakage on Dependency Parsing

Recent work by Søgaard (2020) showed that, treebank size aside, overlap ...
research
12/04/2022

Cross-lingual Similarity of Multilingual Representations Revisited

Related works used indexes like CKA and variants of CCA to measure the s...
research
01/29/2022

Learning to pronounce as measuring cross-lingual joint orthography-phonology complexity

Machine learning models allow us to compare languages by showing how har...

Please sign up or login with your details

Forgot password? Click here to reset