AdapterDrop: On the Efficiency of Adapters in Transformers

10/22/2020
by   Andreas Rücklé, et al.
6

Massively pre-trained transformer models are computationally expensive to fine-tune, slow for inference, and have large storage requirements. Recent approaches tackle these shortcomings by training smaller models, dynamically reducing the model size, and by training light-weight adapters. In this paper, we propose AdapterDrop, removing adapters from lower transformer layers during training and inference, which incorporates concepts from all three directions. We show that AdapterDrop can dynamically reduce the computational overhead when performing inference over multiple tasks simultaneously, with minimal decrease in task performances. We further prune adapters from AdapterFusion, which improves the inference efficiency while maintaining the task performances entirely.

READ FULL TEXT

page 1

page 3

page 5

page 6

page 7

page 8

page 10

page 11

research
07/03/2023

Trainable Transformer in Transformer

Recent works attribute the capability of in-context learning (ICL) in la...
research
11/24/2021

Sparse is Enough in Scaling Transformers

Large Transformer models yield impressive results on many tasks, but are...
research
04/30/2022

AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

Transformer-based pre-trained models with millions of parameters require...
research
05/24/2023

Pre-RMSNorm and Pre-CRMSNorm Transformers: Equivalent and Efficient Pre-LN Transformers

Transformers have achieved great success in machine learning application...
research
08/16/2020

Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size

Fine-tuning a pretrained transformer for a downstream task has become a ...
research
11/18/2021

Dynamic-TinyBERT: Boost TinyBERT's Inference Efficiency by Dynamic Sequence Length

Limited computational budgets often prevent transformers from being used...
research
07/07/2022

Training Transformers Together

The infrastructure necessary for training state-of-the-art models is bec...

Please sign up or login with your details

Forgot password? Click here to reset