Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning

10/18/2022
by   Shuo Xie, et al.
0

While transferring a pretrained language model, common approaches conventionally attach their task-specific classifiers to the top layer and adapt all the pretrained layers. We investigate whether one could make a task-specific selection on which subset of the layers to adapt and where to place the classifier. The goal is to reduce the computation cost of transfer learning methods (e.g. fine-tuning or adapter-tuning) without sacrificing its performance. We propose to select layers based on the variability of their hidden states given a task-specific corpus. We say a layer is already "well-specialized" in a task if the within-class variability of its hidden states is low relative to the between-class variability. Our variability metric is cheap to compute and doesn't need any training or hyperparameter tuning. It is robust to data imbalance and data scarcity. Extensive experiments on the GLUE benchmark demonstrate that selecting layers based on our metric can yield significantly stronger performance than using the same number of top layers and often match the performance of fine-tuning or adapter-tuning the entire language model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/05/2021

Robust Transfer Learning with Pretrained Language Models through Adapters

Transfer learning with large pretrained transformer-based language model...
research
01/01/2021

WARP: Word-level Adversarial ReProgramming

Transfer learning from pretrained language models recently became the do...
research
07/19/2023

Gradient Sparsification For Masked Fine-Tuning of Transformers

Fine-tuning pretrained self-supervised language models is widely adopted...
research
04/10/2022

Parameter-Efficient Tuning by Manipulating Hidden States of Pretrained Language Models For Classification Tasks

Parameter-efficient tuning aims to distill knowledge for downstream task...
research
05/10/2022

Extracting Latent Steering Vectors from Pretrained Language Models

Prior work on controllable text generation has focused on learning how t...
research
03/15/2021

How Many Data Points is a Prompt Worth?

When fine-tuning pretrained models for classification, researchers eithe...
research
07/04/2022

Factorizing Knowledge in Neural Networks

In this paper, we explore a novel and ambitious knowledge-transfer task,...

Please sign up or login with your details

Forgot password? Click here to reset