ReDy: A Novel ReRAM-centric Dynamic Quantization Approach for Energy-efficient CNN Inference

06/28/2023
by   Mohammad Sabri, et al.
0

The primary operation in DNNs is the dot product of quantized input activations and weights. Prior works have proposed the design of memory-centric architectures based on the Processing-In-Memory (PIM) paradigm. Resistive RAM (ReRAM) technology is especially appealing for PIM-based DNN accelerators due to its high density to store weights, low leakage energy, low read latency, and high performance capabilities to perform the DNN dot-products massively in parallel within the ReRAM crossbars. However, the main bottleneck of these architectures is the energy-hungry analog-to-digital conversions (ADCs) required to perform analog computations in-ReRAM, which penalizes the efficiency and performance benefits of PIM. To improve energy-efficiency of in-ReRAM analog dot-product computations we present ReDy, a hardware accelerator that implements a ReRAM-centric Dynamic quantization scheme to take advantage of the bit serial streaming and processing of activations. The energy consumption of ReRAM-based DNN accelerators is directly proportional to the numerical precision of the input activations of each DNN layer. In particular, ReDy exploits that activations of CONV layers from Convolutional Neural Networks (CNNs), a subset of DNNs, are commonly grouped according to the size of their filters and the size of the ReRAM crossbars. Then, ReDy quantizes on-the-fly each group of activations with a different numerical precision based on a novel heuristic that takes into account the statistical distribution of each group. Overall, ReDy greatly reduces the activity of the ReRAM crossbars and the number of A/D conversions compared to an static 8-bit uniform quantization. We evaluate ReDy on a popular set of modern CNNs. On average, ReDy provides 13% energy savings over an ISAAC-like accelerator with negligible accuracy loss and area overhead.

READ FULL TEXT
research
03/09/2018

Bit-Tactical: Exploiting Ineffectual Computations in Convolutional Neural Networks: Which, Why, and How

We show that, during inference with Convolutional Neural Networks (CNNs)...
research
08/11/2023

Comprehensive Benchmarking of Binary Neural Networks on NVM Crossbar Architectures

Non-volatile memory (NVM) crossbars have been identified as a promising ...
research
11/11/2021

Variability-Aware Training and Self-Tuning of Highly Quantized DNNs for Analog PIM

DNNs deployed on analog processing in memory (PIM) architectures are sub...
research
08/03/2023

Evaluation of STT-MRAM as a Scratchpad for Training in ML Accelerators

Progress in artificial intelligence and machine learning over the past d...
research
10/30/2019

Training DNN IoT Applications for Deployment On Analog NVM Crossbars

Deep Neural Networks (DNN) applications are increasingly being deployed ...
research
02/10/2022

Mixture-of-Rookies: Saving DNN Computations by Predicting ReLU Outputs

Deep Neural Networks (DNNs) are widely used in many applications domains...
research
05/03/2020

TIMELY: Pushing Data Movements and Interfaces in PIM Accelerators Towards Local and in Time Domain

Resistive-random-access-memory (ReRAM) based processing-in-memory (R^2PI...

Please sign up or login with your details

Forgot password? Click here to reset