HiFi: High-Information Attention Heads Hold for Parameter-Efficient Model Adaptation

05/08/2023
by   Anchun Gui, et al.
0

To fully leverage the advantages of large-scale pre-trained language models (PLMs) on downstream tasks, it has become a ubiquitous adaptation paradigm to fine-tune the entire parameters of PLMs. However, this paradigm poses issues of inefficient updating and resource over-consuming for fine-tuning in data-scarce and resource-limited scenarios, because of the large scale of parameters in PLMs. To alleviate these concerns, in this paper, we propose a parameter-efficient fine-tuning method HiFi, that is, only the highly informative and strongly correlated attention heads for the specific task are fine-tuned. To search for those significant attention heads, we develop a novel framework to analyze the effectiveness of heads. Specifically, we first model the relationship between heads into a graph from two perspectives of information richness and correlation, and then apply PageRank algorithm to determine the relative importance of each head. Extensive experiments on the GLUE benchmark demonstrate the effectiveness of our method, and show that HiFi obtains state-of-the-art performance over the prior baselines.

READ FULL TEXT

page 3

page 5

page 15

research
10/30/2021

DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

Gigantic pre-trained models have become central to natural language proc...
research
05/17/2023

G-Adapter: Towards Structure-Aware Parameter-Efficient Transfer Learning for Graph Transformer Networks

It has become a popular paradigm to transfer the knowledge of large-scal...
research
10/08/2022

AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models

There are growing interests in adapting large-scale language models usin...
research
12/20/2022

KronA: Parameter Efficient Tuning with Kronecker Adapter

Fine-tuning a Pre-trained Language Model (PLM) on a specific downstream ...
research
06/04/2021

Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators

This paper presents a novel pre-trained language models (PLM) compressio...
research
03/04/2022

Voice-Face Homogeneity Tells Deepfake

Detecting forgery videos is highly desirable due to the abuse of deepfak...
research
05/05/2023

Low-Resource Multi-Granularity Academic Function Recognition Based on Multiple Prompt Knowledge

Fine-tuning pre-trained language models (PLMs), e.g., SciBERT, generally...

Please sign up or login with your details

Forgot password? Click here to reset