LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning

by   Yi-Lin Sung, et al.

Fine-tuning large pre-trained models on downstream tasks has been adopted in a variety of domains recently. However, it is costly to update the entire parameter set of large pre-trained models. Although recently proposed parameter-efficient transfer learning (PETL) techniques allow updating a small subset of parameters (e.g. only using 2 backbone network for a new task, they only reduce the training memory requirement by up to 30 trainable parameters still requires backpropagation through the large pre-trained backbone model. To address this, we propose Ladder Side-Tuning (LST), a new PETL technique that reduces training memory requirements by more substantial amounts. Unlike existing parameter-efficient methods that insert additional parameters inside backbone networks, we train a ladder side network, a small and separate network that takes intermediate activations as input via shortcut connections (ladders) from backbone networks and makes predictions. LST has significantly lower memory requirements than previous methods, because it does not require backpropagation through the backbone network, but instead only through the side network and ladder connections. We evaluate our method with various models (T5, CLIP-T5) on both NLP (GLUE) and vision-language (VQA, GQA, NLVR2, MSCOCO) tasks. LST saves 69 whole network, while other methods only save 26 usages (hence, 2.7x more memory savings). Moreover, LST achieves higher accuracy than Adapter and LoRA in a low-memory regime. To further show the advantage of this better memory efficiency, we also apply LST to larger T5 models (T5-large, T5-3B), attaining better GLUE performance than full fine-tuning and other PETL methods. The exact same trend also holds in our experiments on VL tasks.


Parameter-efficient transfer learning of pre-trained Transformer models for speaker verification using adapters

Recently, the pre-trained Transformer models have received a rising inte...

Parameter-efficient is not sufficient: Exploring Parameter, Memory, and Time Efficient Adapter Tuning for Dense Predictions

Pre-training fine-tuning is a prevalent paradigm in computer vision ...

Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning

Intermediate features of a pre-trained model have been shown informative...

Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning

Parameter-efficient fine-tuning (PEFT) of pre-trained language models (P...

BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning

With the surge of large-scale pre-trained models (PTMs), fine-tuning the...

Unidirectional Thin Adapter for Efficient Adaptation of Deep Neural Networks

In this paper, we propose a new adapter network for adapting a pre-train...

Introspective Cross-Attention Probing for Lightweight Transfer of Pre-trained Models

We propose InCA, a lightweight method for transfer learning that cross-a...

Please sign up or login with your details

Forgot password? Click here to reset