Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning

09/13/2021
by   Runxin Xu, et al.
0

Recent pretrained language models extend from millions to billions of parameters. Thus the need to fine-tune an extremely large pretrained model with a limited training corpus arises in various downstream tasks. In this paper, we propose a straightforward yet effective fine-tuning technique, Child-Tuning, which updates a subset of parameters (called child network) of large pretrained models via strategically masking out the gradients of the non-child network during the backward process. Experiments on various downstream tasks in GLUE benchmark show that Child-Tuning consistently outperforms the vanilla fine-tuning by 1.5 8.6 average score among four different pretrained models, and surpasses the prior fine-tuning techniques by 0.6 1.3 points. Furthermore, empirical results on domain transfer and task transfer show that Child-Tuning can obtain better generalization performance by large margins.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2023

Bi-Drop: Generalizable Fine-tuning for Pre-trained Language Models via Adaptive Subnetwork Optimization

Pretrained language models have achieved remarkable success in a variety...
research
07/19/2023

Gradient Sparsification For Masked Fine-Tuning of Transformers

Fine-tuning pretrained self-supervised language models is widely adopted...
research
01/01/2021

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Fine-tuning is the de facto way to leverage large pretrained language mo...
research
10/22/2022

PATS: Sensitivity-aware Noisy Learning for Pretrained Language Models

A wide range of NLP tasks benefit from the fine-tuning of pretrained lan...
research
05/02/2022

Robust Fine-tuning via Perturbation and Interpolation from In-batch Instances

Fine-tuning pretrained language models (PLMs) on downstream tasks has be...
research
01/29/2023

Debiased Fine-Tuning for Vision-language Models by Prompt Regularization

We present a new paradigm for fine-tuning large-scale visionlanguage pre...
research
09/15/2023

Fine-tune the pretrained ATST model for sound event detection

Sound event detection (SED) often suffers from the data deficiency probl...

Please sign up or login with your details

Forgot password? Click here to reset