How fine can fine-tuning be? Learning efficient language models

04/24/2020
by   Evani Radiya-Dixit, et al.
8

State-of-the-art performance on language understanding tasks is now achieved with increasingly large networks; the current record holder has billions of parameters. Given a language model pre-trained on massive unlabeled text corpora, only very light supervised fine-tuning is needed to learn a task: the number of fine-tuning steps is typically five orders of magnitude lower than the total parameter count. Does this mean that fine-tuning only introduces small differences from the pre-trained model in the parameter space? If so, can one avoid storing and computing an entire model for each task? In this work, we address these questions by using Bidirectional Encoder Representations from Transformers (BERT) as an example. As expected, we find that the fine-tuned models are close in parameter space to the pre-trained one, with the closeness varying from layer to layer. We show that it suffices to fine-tune only the most critical layers. Further, we find that there are surprisingly many good solutions in the set of sparsified versions of the pre-trained model. As a result, fine-tuning of huge language models can be achieved by simply setting a certain number of entries in certain layers of the pre-trained parameters to zero, saving both task-specific parameter storage and computational cost.

READ FULL TEXT

page 4

page 7

research
10/17/2022

Scaling Shifting Your Features: A New Baseline for Efficient Model Tuning

Existing fine-tuning methods either tune all parameters of the pre-train...
research
05/24/2023

Towards Adaptive Prefix Tuning for Parameter-Efficient Language Model Fine-tuning

Fine-tuning large pre-trained language models on various downstream task...
research
05/26/2023

Parameter-Efficient Fine-Tuning without Introducing New Latency

Parameter-efficient fine-tuning (PEFT) of pre-trained language models ha...
research
12/22/2020

Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning

Although pretrained language models can be fine-tuned to produce state-o...
research
07/25/2022

Fine-Tuning BERT for Automatic ADME Semantic Labeling in FDA Drug Labeling to Enhance Product-Specific Guidance Assessment

Product-specific guidances (PSGs) recommended by the United States Food ...
research
09/26/2022

Towards Parameter-Efficient Integration of Pre-Trained Language Models In Temporal Video Grounding

This paper explores the task of Temporal Video Grounding (TVG) where, gi...
research
03/06/2019

Understanding and Visualizing Deep Visual Saliency Models

Recently, data-driven deep saliency models have achieved high performanc...

Please sign up or login with your details

Forgot password? Click here to reset