DeepAI AI Chat
Log In Sign Up

When does Parameter-Efficient Transfer Learning Work for Machine Translation?

by   Ahmet Üstün, et al.

Parameter-efficient fine-tuning methods (PEFTs) offer the promise of adapting large pre-trained models while only tuning a small number of parameters. They have been shown to be competitive with full model fine-tuning for many downstream tasks. However, prior work indicates that PEFTs may not work as well for machine translation (MT), and there is no comprehensive study showing when PEFTs work for MT. We conduct a comprehensive empirical study of PEFTs for MT, considering (1) various parameter budgets, (2) a diverse set of language-pairs, and (3) different pre-trained models. We find that 'adapters', in which small feed-forward networks are added after every layer, are indeed on par with full model fine-tuning when the parameter budget corresponds to 10 parameters. Nevertheless, as the number of tuned parameters decreases, the performance of PEFTs decreases. The magnitude of this decrease depends on the language pair, with PEFTs particularly struggling for distantly related language-pairs. We find that using PEFTs with a larger pre-trained model outperforms full fine-tuning with a smaller model, and for smaller training data sizes, PEFTs outperform full fine-tuning for the same pre-trained model.


page 1

page 2

page 3

page 4


Parameter-efficient transfer learning of pre-trained Transformer models for speaker verification using adapters

Recently, the pre-trained Transformer models have received a rising inte...

Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine Translation

There has been recent success in pre-training on monolingual data and fi...

Lightweight Adapter Tuning for Multilingual Speech Translation

Adapter modules were recently introduced as an efficient alternative to ...

Towards Personalized Intelligence at Scale

Personalized Intelligence (PI) is the problem of providing customized AI...

Visual Tuning

Fine-tuning visual models has been widely shown promising performance on...

OpenDelta: A Plug-and-play Library for Parameter-efficient Adaptation of Pre-trained Models

The scale of large pre-trained models (PTMs) poses significant challenge...

P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks

Prompt tuning, which only tunes continuous prompts with a frozen languag...