SlimFit: Memory-Efficient Fine-Tuning of Transformer-based Models Using Training Dynamics

05/29/2023
by   Arash Ardakani, et al.
0

Transformer-based models, such as BERT and ViT, have achieved state-of-the-art results across different natural language processing (NLP) and computer vision (CV) tasks. However, these models are extremely memory intensive during their fine-tuning process, making them difficult to deploy on GPUs with limited memory resources. To address this issue, we introduce a new tool called SlimFit that reduces the memory requirements of these models by dynamically analyzing their training dynamics and freezing less-contributory layers during fine-tuning. The layers to freeze are chosen using a runtime inter-layer scheduling algorithm. SlimFit adopts quantization and pruning for particular layers to balance the load of dynamic activations and to minimize the memory footprint of static activations, where static activations refer to those that cannot be discarded regardless of freezing. This allows SlimFit to freeze up to 95 transformer-based models such as ViT and BERT by an average of 2.2x, across different NLP and CV benchmarks/datasets such as GLUE, SQuAD 2.0, CIFAR-10, CIFAR-100 and ImageNet with an average degradation of 0.2 such NLP and CV tasks, SlimFit can reduce up to 3.1x the total on-device memory usage with an accuracy degradation of only up to 0.4 fine-tuning of ViT on ImageNet and BERT on SQuAD 2.0 with a batch size of 128 requires 3 and 2 32GB GPUs respectively, SlimFit enables their fine-tuning on a single 32GB GPU without any significant accuracy degradation.

READ FULL TEXT
research
05/03/2022

Efficient Fine-Tuning of BERT Models on the Edge

Resource-constrained devices are increasingly the deployment targets of ...
research
10/19/2022

Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction

Training deep learning models can be computationally expensive. Prior wo...
research
09/20/2022

Integer Fine-tuning of Transformer-based Models

Transformer based models are used to achieve state-of-the-art performanc...
research
02/13/2020

Training Large Neural Networks with Constant Memory using a New Execution Algorithm

Widely popular transformer-based NLP models such as BERT and GPT have en...
research
09/12/2019

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

Transformer based architectures have become de-facto models used for a r...
research
03/09/2023

Dynamic Stashing Quantization for Efficient Transformer Training

Large Language Models (LLMs) have demonstrated impressive performance on...
research
05/31/2023

Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training

Running out of GPU memory has become a main bottleneck for large-scale D...

Please sign up or login with your details

Forgot password? Click here to reset