TIMELY: Pushing Data Movements and Interfaces in PIM Accelerators Towards Local and in Time Domain

05/03/2020
by   Weitao Li, et al.
0

Resistive-random-access-memory (ReRAM) based processing-in-memory (R^2PIM) accelerators show promise in bridging the gap between Internet of Thing devices' constrained resources and Convolutional/Deep Neural Networks' (CNNs/DNNs') prohibitive energy cost. Specifically, R^2PIM accelerators enhance energy efficiency by eliminating the cost of weight movements and improving the computational density through ReRAM's high density. However, the energy efficiency is still limited by the dominant energy cost of input and partial sum (Psum) movements and the cost of digital-to-analog (D/A) and analog-to-digital (A/D) interfaces. In this work, we identify three energy-saving opportunities in R^2PIM accelerators: analog data locality, time-domain interfacing, and input access reduction, and propose an innovative R^2PIM accelerator called TIMELY, with three key contributions: (1) TIMELY adopts analog local buffers (ALBs) within ReRAM crossbars to greatly enhance the data locality, minimizing the energy overheads of both input and Psum movements; (2) TIMELY largely reduces the energy of each single D/A (and A/D) conversion and the total number of conversions by using time-domain interfaces (TDIs) and the employed ALBs, respectively; (3) we develop an only-once input read (O^2IR) mapping method to further decrease the energy of input accesses and the number of D/A conversions. The evaluation with more than 10 CNN/DNN models and various chip configurations shows that, TIMELY outperforms the baseline R^2PIM accelerator, PRIME, by one order of magnitude in energy efficiency while maintaining better computational density (up to 31.2×) and throughput (up to 736.6×). Furthermore, comprehensive studies are performed to evaluate the effectiveness of the proposed ALB, TDI, and O^2IR innovations in terms of energy savings and area reduction.

READ FULL TEXT

page 1

page 10

research
06/15/2023

Leveraging Residue Number System for Designing High-Precision Analog Deep Neural Network Accelerators

Achieving high accuracy, while maintaining good energy efficiency, in an...
research
03/04/2022

Efficient Analog CAM Design

Content Addressable Memories (CAMs) are considered a key-enabler for in-...
research
06/28/2023

ReDy: A Novel ReRAM-centric Dynamic Quantization Approach for Energy-efficient CNN Inference

The primary operation in DNNs is the dot product of quantized input acti...
research
01/30/2022

Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals

Processing-in-memory (PIM) architectures have demonstrated great potenti...
research
04/17/2023

RAELLA: Reforming the Arithmetic for Efficient, Low-Resolution, and Low-Loss Analog PIM: No Retraining Required!

Processing-In-Memory (PIM) accelerators have the potential to efficientl...
research
11/08/2019

Communication Lower Bound in Convolution Accelerators

In current convolutional neural network (CNN) accelerators, communicatio...
research
03/29/2023

Is This Computing Accelerator Evaluation Full of Hot Air?

Computing accelerators must significantly improve at least one metric su...

Please sign up or login with your details

Forgot password? Click here to reset