Design of Reconfigurable Multi-Operand Adder for Massively Parallel Processing

08/06/2020
by   Shilpa Mayannavar, et al.
0

The paper presents a systematic study and implementation of a reconfigurable combinatorial multi-operand adder for use in Deep Learning systems. The size of carry changes with the number of operands and hence a reliable algorithm to estimate exact number of carry bits is needed for optimal implementation of a reconfigurable multi-operand adder. A combinatorial multi-operand adder can be faster compared to a sequential implementation using a two operand adder. Use cases for such adders occur in modern processors for deep neural networks. Such processors require massively parallel computing resources on chip. This paper presents a method to estimate the upper bound on the size of carry. A method to compute the exact number of carry bits required for a multi-operand addition operation. A fast combinatorial parallel 4-operand adder module is presented. An algorithm to reconfigure these adder modules to implement larger adders is also described. Further, the paper presents two compact but slower iterative structures that implement multi-operand addition, iterating with one column at a time till the entire word is covered. Such serial/iterative operations are slow but occupy small space while parallel operations are fast but use large silicon area on chip. Interestingly, the area-to-throughput ratio of two architectures can tilt in favor of slower, smaller and large number units instead of the fewer numbers of fast and large compute units. A lemma presented in the paper may be used to identify the condition when such tilt occurs. Potentially, this can save silicon space and increase the throughput of chips for high performance computing. Simulation results of a 16 operand adder and using an set of 4-operand adders for use in neural networks have been presented. Simulation results show that performance gain improves as the number of operations or operands increases.

READ FULL TEXT

page 20

page 21

research
08/03/2020

Bit Parallel 6T SRAM In-memory Computing with Reconfigurable Bit-Precision

This paper presents 6T SRAM cell-based bit-parallel in-memory computing ...
research
07/06/2017

Pipelined Parallel FFT Architecture

In this paper, an optimized efficient VLSI architecture of a pipeline Fa...
research
12/20/2016

NOP - A Simple Experimental Processor for Parallel Deployment

The design of a parallel computing system using several thousands or eve...
research
12/11/2018

A Non-iterative Parallelizable Eigenbasis Algorithm for Johnson Graphs

We present a new O(k^2 nk^2) method for generating an orthogonal basis o...
research
08/31/2023

Efficient Additions and Montgomery Reductions of Large Integers for SIMD

This paper presents efficient algorithms, designed to leverage SIMD for ...
research
08/26/2020

An Approximate Carry Estimating Simultaneous Adder with Rectification

Approximate computing has in recent times found significant applications...
research
03/14/2018

Efficient Realization of Givens Rotation through Algorithm-Architecture Co-design for Acceleration of QR Factorization

We present efficient realization of Generalized Givens Rotation (GGR) ba...

Please sign up or login with your details

Forgot password? Click here to reset