READ: Recurrent Adaptation of Large Transformers

05/24/2023
by   Sid Wang, et al.
0

Fine-tuning large-scale Transformers has led to the explosion of many AI applications across Natural Language Processing and Computer Vision tasks. However, fine-tuning all pre-trained model parameters becomes impractical as the model size and number of tasks increase. Parameter-efficient transfer learning (PETL) methods aim to address these challenges. While effective in reducing the number of trainable parameters, PETL methods still require significant energy and computational resources to fine-tune. In this paper, we introduce REcurrent ADaption (READ) – a lightweight and memory-efficient fine-tuning method – to overcome the limitations of the current PETL approaches. Specifically, READ inserts a small RNN network alongside the backbone model so that the model does not have to back-propagate through the large backbone network. Through comprehensive empirical evaluation of the GLUE benchmark, we demonstrate READ can achieve a 56% reduction in the training memory consumption and an 84% reduction in the GPU energy usage while retraining high model quality compared to full-tuning. Additionally, the model size of READ does not grow with the backbone model size, making it a highly scalable solution for fine-tuning large Transformers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2022

Parameter-efficient Fine-tuning for Vision Transformers

In computer vision, it has achieved great success in adapting large-scal...
research
03/17/2023

LION: Implicit Vision Prompt Tuning

Despite recent competitive performance across a range of vision tasks, v...
research
05/09/2021

Improving Patent Mining and Relevance Classification using Transformers

Patent analysis and mining are time-consuming and costly processes for c...
research
10/07/2022

Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks

Adapting large-scale pretrained models to various downstream tasks via f...
research
12/06/2022

Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning

Intermediate features of a pre-trained model have been shown informative...
research
09/11/2023

DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning

Prompt tuning (PT), where a small amount of trainable soft (continuous) ...
research
01/04/2023

Parameter-Efficient Fine-Tuning Design Spaces

Parameter-efficient fine-tuning aims to achieve performance comparable t...

Please sign up or login with your details

Forgot password? Click here to reset