DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

10/30/2021
by   Xuxi Chen, et al.
0

Gigantic pre-trained models have become central to natural language processing (NLP), serving as the starting point for fine-tuning towards a range of downstream tasks. However, two pain points persist for this paradigm: (a) as the pre-trained models grow bigger (e.g., 175B parameters for GPT-3), even the fine-tuning process can be time-consuming and computationally expensive; (b) the fine-tuned model has the same size as its starting point by default, which is neither sensible due to its more specialized functionality, nor practical since many fine-tuned models will be deployed in resource-constrained environments. To address these pain points, we propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights. Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning - by enforcing sparsity-aware weight updates on top of the pre-trained weights; and (ii) resource-efficient inference - by encouraging a sparse weight structure towards the final fine-tuned model. We leverage sparsity in these two directions by exploiting both unstructured and structured sparse patterns in pre-trained language models via magnitude-based pruning and ℓ_1 sparse regularization. Extensive experiments and in-depth investigations, with diverse network backbones (i.e., BERT, GPT-2, and DeBERTa) on dozens of datasets, consistently demonstrate highly impressive parameter-/training-/inference-efficiency, while maintaining competitive downstream transfer performance. For instance, our DSEE-BERT obtains about 35% inference FLOPs savings with <1 parameters and comparable performance to conventional fine-tuning. Codes are available in https://github.com/VITA-Group/DSEE.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/18/2023

SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models

The pre-training and fine-tuning paradigm has contributed to a number of...
research
10/11/2022

A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models

Despite the remarkable success of pre-trained language models (PLMs), th...
research
05/08/2023

HiFi: High-Information Attention Heads Hold for Parameter-Efficient Model Adaptation

To fully leverage the advantages of large-scale pre-trained language mod...
research
06/18/2023

Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models

Large pre-trained transformers have been receiving explosive attention i...
research
11/30/2022

How to Train an Accurate and Efficient Object Detection Model on Any Dataset

The rapidly evolving industry demands high accuracy of the models withou...
research
09/14/2020

Can Fine-tuning Pre-trained Models Lead to Perfect NLP? A Study of the Generalizability of Relation Extraction

Fine-tuning pre-trained models have achieved impressive performance on s...
research
05/23/2022

Parameter-Efficient Sparsity for Large Language Models Fine-Tuning

With the dramatically increased number of parameters in language models,...

Please sign up or login with your details

Forgot password? Click here to reset