BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models

06/18/2021
by   Elad Ben Zaken, et al.
3

We show that with small-to-medium training data, fine-tuning only the bias terms (or a subset of the bias terms) of pre-trained BERT models is competitive with (and sometimes better than) fine-tuning the entire model. For larger data, bias-only fine-tuning is competitive with other sparse fine-tuning methods. Besides their practical utility, these findings are relevant for the question of understanding the commonly-used process of finetuning: they support the hypothesis that finetuning is mainly about exposing knowledge induced by language-modeling training, rather than learning new task-specific linguistic knowledge.

READ FULL TEXT

page 3

page 7

page 8

research
04/23/2020

UHH-LT LT2 at SemEval-2020 Task 12: Fine-Tuning of Pre-Trained Transformer Networks for Offensive Language Detection

Fine-tuning of pre-trained transformer networks such as BERT yield state...
research
06/06/2023

An Empirical Analysis of Parameter-Efficient Methods for Debiasing Pre-Trained Language Models

The increasingly large size of modern pretrained language models not onl...
research
10/11/2022

A Kernel-Based View of Language Model Fine-Tuning

It has become standard to solve NLP tasks by fine-tuning pre-trained lan...
research
05/18/2023

Ahead-of-Time P-Tuning

In this paper, we propose Ahead-of-Time (AoT) P-Tuning, a novel paramete...
research
06/16/2023

Data Selection for Fine-tuning Large Language Models Using Transferred Shapley Values

Although Shapley values have been shown to be highly effective for ident...
research
06/10/2020

Revisiting Few-sample BERT Fine-tuning

We study the problem of few-sample fine-tuning of BERT contextual repres...
research
04/17/2022

Pathologies of Pre-trained Language Models in Few-shot Fine-tuning

Although adapting pre-trained language models with few examples has show...

Please sign up or login with your details

Forgot password? Click here to reset