CREW: Computation Reuse and Efficient Weight Storage for Hardware-accelerated MLPs and RNNs

07/20/2021
by   Marc Riera, et al.
0

Deep Neural Networks (DNNs) have achieved tremendous success for cognitive applications. The core operation in a DNN is the dot product between quantized inputs and weights. Prior works exploit the weight/input repetition that arises due to quantization to avoid redundant computations in Convolutional Neural Networks (CNNs). However, in this paper we show that their effectiveness is severely limited when applied to Fully-Connected (FC) layers, which are commonly used in state-of-the-art DNNs, as it is the case of modern Recurrent Neural Networks (RNNs) and Transformer models. To improve energy-efficiency of FC computation we present CREW, a hardware accelerator that implements Computation Reuse and an Efficient Weight Storage mechanism to exploit the large number of repeated weights in FC layers. CREW first performs the multiplications of the unique weights by their respective inputs and stores the results in an on-chip buffer. The storage requirements are modest due to the small number of unique weights and the relatively small size of the input compared to convolutional layers. Next, CREW computes each output by fetching and adding its required products. To this end, each weight is replaced offline by an index in the buffer of unique products. Indices are typically smaller than the quantized weights, since the number of unique weights for each input tends to be much lower than the range of quantized weights, which reduces storage and memory bandwidth requirements. Overall, CREW greatly reduces the number of multiplications and provides significant savings in model memory footprint and memory bandwidth usage. We evaluate CREW on a diverse set of modern DNNs. On average, CREW provides 2.61x speedup and 2.42x energy savings over a TPU-like accelerator. Compared to UCNN, a state-of-art computation reuse technique, CREW achieves 2.10x speedup and 2.08x energy savings on average.

READ FULL TEXT
research
05/07/2020

SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost Computation

We present SmartExchange, an algorithm-hardware co-design framework to t...
research
04/18/2018

UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition

Convolutional Neural Networks (CNNs) have begun to permeate all corners ...
research
04/19/2022

Seculator: A Fast and Secure Neural Processing Unit

Securing deep neural networks (DNNs) is a problem of significant interes...
research
01/04/2021

SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and Training

The record-breaking performance of deep neural networks (DNNs) comes wit...
research
12/01/2022

Exploiting Kernel Compression on BNNs

Binary Neural Networks (BNNs) are showing tremendous success on realisti...
research
05/20/2020

BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based Quantized DNNs

The number of parameters in deep neural networks (DNNs) is rapidly incre...
research
07/10/2018

Eyeriss v2: A Flexible and High-Performance Accelerator for Emerging Deep Neural Networks

The design of DNNs has increasingly focused on reducing the computationa...

Please sign up or login with your details

Forgot password? Click here to reset