Dynamic Stashing Quantization for Efficient Transformer Training

03/09/2023
by   Guo Yang, et al.
1

Large Language Models (LLMs) have demonstrated impressive performance on a range of Natural Language Processing (NLP) tasks. Unfortunately, the immense amount of computations and memory accesses required for LLM training makes them prohibitively expensive in terms of hardware cost, and thus challenging to deploy in use cases such as on-device learning. In this paper, motivated by the observation that LLM training is memory-bound, we propose a novel dynamic quantization strategy, termed Dynamic Stashing Quantization (DSQ), that puts a special focus on reducing the memory operations, but also enjoys the other benefits of low precision training, such as the reduced arithmetic cost. We conduct a thorough study on two translation tasks (trained-from-scratch) and three classification tasks (fine-tuning). DSQ reduces the amount of arithmetic operations by 20.95× and the number of DRAM operations by 2.55× on IWSLT17 compared to the standard 16-bit fixed-point, which is widely used in on-device learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2022

F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization

Neural network quantization is a promising compression technique to redu...
research
04/21/2023

Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition

In recent years, Large Language Models such as GPT-3 showed remarkable c...
research
05/29/2023

SlimFit: Memory-Efficient Fine-Tuning of Transformer-based Models Using Training Dynamics

Transformer-based models, such as BERT and ViT, have achieved state-of-t...
research
09/12/2022

FP8 Formats for Deep Learning

FP8 is a natural progression for accelerating deep learning training inf...
research
02/10/2023

Step by Step Loss Goes Very Far: Multi-Step Quantization for Adversarial Text Attacks

We propose a novel gradient-based attack against transformer-based langu...
research
02/23/2023

Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers

Pre-trained Transformer models such as BERT have shown great success in ...
research
06/13/2022

Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training

Data clipping is crucial in reducing noise in quantization operations an...

Please sign up or login with your details

Forgot password? Click here to reset