Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training

12/20/2022
by   Kelly Marchisio, et al.
0

Prior work has shown that it is possible to expand pretrained Masked Language Models (MLMs) to new languages by learning a new set of embeddings, while keeping the transformer body frozen. Despite learning a small subset of parameters, this approach is not compute-efficient, as training the new embeddings requires a full forward and backward pass over the entire model. In this work, we propose mini-model adaptation, a compute-efficient alternative that builds a shallow mini-model from a fraction of a large model's parameters. New language-specific embeddings can then be efficiently trained over the mini-model, and plugged into the aligned large model for rapid cross-lingual transfer. We explore two approaches to learn mini-models: MiniJoint, which jointly pretrains the primary model and the mini-model using a single transformer with a secondary MLM head at a middle layer; and MiniPost, where we start from a regular pretrained model and build a mini-model by extracting and freezing a few layers and learning a small number of parameters on top. Experiments on XNLI, MLQA and PAWS-X show that mini-model adaptation matches the performance of the standard approach using up to 2.4x less compute.

READ FULL TEXT

page 2

page 14

research
01/23/2023

Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning

Most Transformer language models are primarily pretrained on English tex...
research
09/10/2019

MultiFiT: Efficient Multi-lingual Language Model Fine-tuning

Pretrained language models are promising particularly for low-resource l...
research
11/04/2019

Emerging Cross-lingual Structure in Pretrained Language Models

We study the problem of multilingual masked language modeling, i.e. the ...
research
07/03/2023

Improving Language Plasticity via Pretraining with Active Forgetting

Pretrained language models (PLMs) are today the primary model for natura...
research
06/06/2019

Cross-Lingual Training for Automatic Question Generation

Automatic question generation (QG) is a challenging problem in natural l...
research
06/18/2021

An Investigation into Mini-Batch Rule Learning

We investigate whether it is possible to learn rule sets efficiently in ...
research
02/24/2022

Oolong: Investigating What Makes Crosslingual Transfer Hard with Controlled Studies

Little is known about what makes cross-lingual transfer hard, since fact...

Please sign up or login with your details

Forgot password? Click here to reset