Adaptive Loss Scaling for Mixed Precision Training

10/28/2019
by   Ruizhe Zhao, et al.
0

Mixed precision training (MPT) is becoming a practical technique to improve the speed and energy efficiency of training deep neural networks by leveraging the fast hardware support for IEEE half-precision floating point that is available in existing GPUs. MPT is typically used in combination with a technique called loss scaling, that works by scaling up the loss value up before the start of backpropagation in order to minimize the impact of numerical underflow on training. Unfortunately, existing methods make this loss scale value a hyperparameter that needs to be tuned per-model, and a single scale cannot be adapted to different layers at different training stages. We introduce a loss scaling-based training method called adaptive loss scaling that makes MPT easier and more practical to use, by removing the need to tune a model-specific loss scale hyperparameter. We achieve this by introducing layer-wise loss scale values which are automatically computed during training to deal with underflow more effectively than existing methods. We present experimental results on a variety of networks and tasks that show our approach can shorten the time to convergence and improve accuracy compared to the existing state-of-the-art MPT and single-precision floating point

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/10/2017

Mixed Precision Training

Deep neural networks have enabled progress in a wide variety of applicat...
research
04/04/2019

Regularizing Activation Distribution for Training Binarized Deep Networks

Binarized Neural Networks (BNNs) can significantly reduce the inference ...
research
06/01/2016

Profile-Driven Automated Mixed Precision

We present a scheme to automatically set the precision of floating point...
research
01/16/2020

Shifted and Squeezed 8-bit Floating Point format for Low-Precision Training of Deep Neural Networks

Training with larger number of parameters while keeping fast iterations ...
research
08/18/2020

Compute, Time and Energy Characterization of Encoder-Decoder Networks with Automatic Mixed Precision Training

Deep neural networks have shown great success in many diverse fields. Th...
research
11/30/2019

Training Distributed Deep Recurrent Neural Networks with Mixed Precision on GPU Clusters

In this paper, we evaluate training of deep recurrent neural networks wi...
research
11/20/2019

Auto-Precision Scaling for Distributed Deep Learning

In recent years, large-batch optimization is becoming the key of distrib...

Please sign up or login with your details

Forgot password? Click here to reset