Training Integer-Only Deep Recurrent Neural Networks

12/22/2022
by   Vahid Partovi Nia, et al.
0

Recurrent neural networks (RNN) are the backbone of many text and speech applications. These architectures are typically made up of several computationally complex components such as; non-linear activation functions, normalization, bi-directional dependence and attention. In order to maintain good accuracy, these components are frequently run using full-precision floating-point computation, making them slow, inefficient and difficult to deploy on edge devices. In addition, the complex nature of these operations makes them challenging to quantize using standard quantization methods without a significant performance drop. We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN). Our approach supports layer normalization, attention, and an adaptive piecewise linear (PWL) approximation of activation functions, to serve a wide range of state-of-the-art RNNs. The proposed method enables RNN-based language models to run on edge devices with 2× improvement in runtime, and 4× reduction in model size while maintaining similar accuracy as its full-precision counterpart.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2021

iRNN: Integer-only Recurrent Neural Network

Recurrent neural networks (RNN) are used in many real-world text and spe...
research
06/14/2019

On the Computational Power of RNNs

Recent neural network architectures such as the basic recurrent neural n...
research
10/20/2017

Low Precision RNNs: Quantizing RNNs Without Losing Accuracy

Similar to convolution neural networks, recurrent neural networks (RNNs)...
research
04/20/2020

Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation

Quantization techniques can reduce the size of Deep Neural Networks and ...
research
09/17/2019

K-TanH: Hardware Efficient Activations For Deep Learning

We propose K-TanH, a novel, highly accurate, hardware efficient approxim...
research
07/20/2020

DiffRNN: Differential Verification of Recurrent Neural Networks

Recurrent neural networks (RNNs) such as Long Short Term Memory (LSTM) n...
research
05/08/2023

Flex-SFU: Accelerating DNN Activation Functions by Non-Uniform Piecewise Approximation

Modern DNN workloads increasingly rely on activation functions consistin...

Please sign up or login with your details

Forgot password? Click here to reset