An Empirical Study on the Transferability of Transformer Modules in Parameter-Efficient Fine-Tuning

02/01/2023
by   Mohammad AkbarTajari, et al.
0

Parameter-efficient fine-tuning approaches have recently garnered a lot of attention. Having considerably lower number of trainable weights, these methods can bring about scalability and computational effectiveness. In this paper, we look for optimal sub-networks and investigate the capability of different transformer modules in transferring knowledge from a pre-trained model to a downstream task. Our empirical results suggest that every transformer module in BERT can act as a winning ticket: fine-tuning each specific module while keeping the rest of the network frozen can lead to comparable performance to the full fine-tuning. Among different modules, LayerNorms exhibit the best capacity for knowledge transfer with limited trainable weights, to the extent that, with only 0.003 acceptable performance on various target tasks. On the reasons behind their effectiveness, we argue that their notable performance could be attributed to their high-magnitude weights compared to that of the other modules in the pre-trained BERT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2022

Parameter-efficient transfer learning of pre-trained Transformer models for speaker verification using adapters

Recently, the pre-trained Transformer models have received a rising inte...
research
02/02/2019

Parameter-Efficient Transfer Learning for NLP

Fine-tuning large pre-trained models is an effective transfer mechanism ...
research
06/15/2022

Sparse Structure Search for Parameter-Efficient Tuning

Adapting large pre-trained models (PTMs) through fine-tuning imposes pro...
research
12/20/2022

KronA: Parameter Efficient Tuning with Kronecker Adapter

Fine-tuning a Pre-trained Language Model (PLM) on a specific downstream ...
research
03/08/2023

RADAM: Texture Recognition through Randomized Aggregated Encoding of Deep Activation Maps

Texture analysis is a classical yet challenging task in computer vision ...
research
06/14/2021

Why Can You Lay Off Heads? Investigating How BERT Heads Transfer

The huge size of the widely used BERT family models has led to recent ef...
research
05/23/2022

Simple Recurrence Improves Masked Language Models

In this work, we explore whether modeling recurrence into the Transforme...

Please sign up or login with your details

Forgot password? Click here to reset