K-TanH: Hardware Efficient Activations For Deep Learning

09/17/2019
by   Abhisek Kundu, et al.
0

We propose K-TanH, a novel, highly accurate, hardware efficient approximation of popular activation function Tanh for Deep Learning. K-TanH consists of a sequence of parameterized bit/integer operations, such as, masking, shift and add/subtract (no floating point operation needed) where parameters are stored in a very small look-up table. The design of K-TanH is flexible enough to deal with multiple numerical formats, such as, FP32 and BFloat16. High quality approximations to other activation functions, e.g., Swish and GELU, can be derived from K-TanH. We provide RTL design for K-TanH to demonstrate its area/power/performance efficacy. It is more accurate than existing piecewise approximations for Tanh. For example, K-TanH achieves ∼ 5× speed up and > 6× reduction in maximum approximation error over software implementation of Hard TanH. Experimental results for low-precision BFloat16 training of language translation model GNMT on WMT16 data sets with approximate Tanh and Sigmoid obtained via K-TanH achieve similar accuracy and convergence as training with exact Tanh and Sigmoid.

READ FULL TEXT
research
11/01/2018

Rethinking floating point for deep learning

Reducing hardware overhead of neural networks for faster or lower power ...
research
01/13/2021

Reproducing Activation Function for Deep Learning

In this paper, we propose the reproducing activation function to improve...
research
04/15/2021

All-You-Can-Fit 8-Bit Flexible Floating-Point Format for Accurate and Memory-Efficient Inference of Deep Neural Networks

Modern deep neural network (DNN) models generally require a huge amount ...
research
05/08/2023

Flex-SFU: Accelerating DNN Activation Functions by Non-Uniform Piecewise Approximation

Modern DNN workloads increasingly rely on activation functions consistin...
research
06/06/2021

From DNNs to GANs: Review of efficient hardware architectures for deep learning

In recent times, the trend in very large scale integration (VLSI) indust...
research
10/23/2020

The Case for Distance-Bounded Spatial Approximations

Spatial approximations have been traditionally used in spatial databases...
research
12/22/2022

Training Integer-Only Deep Recurrent Neural Networks

Recurrent neural networks (RNN) are the backbone of many text and speech...

Please sign up or login with your details

Forgot password? Click here to reset