Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning

01/23/2023
by   Malte Ostendorff, et al.
0

Most Transformer language models are primarily pretrained on English text, limiting their use for other languages. As the model sizes grow, the performance gap between English and other languages with fewer compute and data resources increases even further. Consequently, more resource-efficient training methods are needed to bridge the gap for languages with fewer resources available. To address this problem, we introduce a cross-lingual and progressive transfer learning approach, called CLP-Transfer, that transfers models from a source language, for which pretrained models are publicly available, like English, to a new target language. As opposed to prior work, which focused on the cross-lingual transfer between two languages, we extend the transfer to the model size. Given a pretrained model in a source language, we aim for a same-sized model in a target language. Instead of training a model from scratch, we exploit a smaller model that is in the target language but requires much fewer resources. Both small and source models are then used to initialize the token embeddings of the larger model based on the overlapping vocabulary of the source and target language. All remaining weights are reused from the model in the source language. This approach outperforms the sole cross-lingual transfer and can save up to 80 the random initialization.

READ FULL TEXT
research
12/13/2021

WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models

Recently, large pretrained language models (LMs) have gained popularity....
research
12/20/2022

Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training

Prior work has shown that it is possible to expand pretrained Masked Lan...
research
04/23/2021

Transfer training from smaller language model

Large language models have led to state-of-the-art accuracies across a r...
research
02/24/2022

Oolong: Investigating What Makes Crosslingual Transfer Hard with Controlled Studies

Little is known about what makes cross-lingual transfer hard, since fact...
research
06/02/2021

Lower Perplexity is Not Always Human-Like

In computational psycholinguistics, various language models have been ev...
research
07/22/2020

Effects of Language Relatedness for Cross-lingual Transfer Learning in Character-Based Language Models

Character-based Neural Network Language Models (NNLM) have the advantage...
research
01/09/2016

Empirical Gaussian priors for cross-lingual transfer learning

Sequence model learning algorithms typically maximize log-likelihood min...

Please sign up or login with your details

Forgot password? Click here to reset