Efficient, arbitrarily high precision hardware logarithmic arithmetic for linear algebra

04/17/2020
by   Jeff Johnson, et al.
0

The logarithmic number system (LNS) is arguably not broadly used due to exponential circuit overheads for summation tables relative to arithmetic precision. Methods to reduce this overhead have been proposed, yet still yield designs with high chip area and power requirements. Use remains limited to lower precision or high multiply/add ratio cases, while much of linear algebra (near 1:1 multiply/add ratio) does not qualify. We present a dual-base approximate logarithmic arithmetic comparable to floating point in use, yet unlike LNS it is easily fully pipelined, extendable to arbitrary precision with O(n^2) overhead, and energy efficient at a 1:1 multiply/add ratio. Compared to float32 or float64 vector inner product with FMA, our design is respectively 2.3x and 4.6x more energy efficient in 7 nm CMOS. It depends on exp and log evaluation 5.4x and 3.2x more energy efficient, at 0.23x and 0.37x the chip area for equivalent accuracy versus standard hyperbolic CORDIC using shift-and-add and approximated ODE integration in the style of Revol and Yakoubsohn. This technique is a useful design alternative for low power, high precision hardened linear algebra in computer vision, graphics, computational photography and machine learning applications.

READ FULL TEXT

page 1

page 8

research
11/01/2018

Rethinking floating point for deep learning

Reducing hardware overhead of neural networks for faster or lower power ...
research
02/12/2021

Low precision logarithmic number systems: Beyond base-2

Logarithmic number systems (LNS) are used to represent real numbers in m...
research
06/26/2021

Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update

Representing deep neural networks (DNNs) in low-precision is a promising...
research
09/01/2020

TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference

TensorDash is a hardware level technique for enabling data-parallel MAC ...
research
04/11/2022

Multiplier with Reduced Activities and Minimized Interconnect for Inner Product Arrays

We present a pipelined multiplier with reduced activities and minimized ...
research
04/03/2023

Monotonicity of Multi-Term Floating-Point Adders

In the literature on algorithms for performing the multi-term addition s...

Please sign up or login with your details

Forgot password? Click here to reset