A Heterogeneous In-Memory Computing Cluster For Flexible End-to-End Inference of Real-World Deep Neural Networks

01/04/2022
by   Angelo Garofalo, et al.
3

Deployment of modern TinyML tasks on small battery-constrained IoT devices requires high computational energy efficiency. Analog In-Memory Computing (IMC) using non-volatile memory (NVM) promises major efficiency improvements in deep neural network (DNN) inference and serves as on-chip memory storage for DNN weights. However, IMC's functional flexibility limitations and their impact on performance, energy, and area efficiency are not yet fully understood at the system level. To target practical end-to-end IoT applications, IMC arrays must be enclosed in heterogeneous programmable systems, introducing new system-level challenges which we aim at addressing in this work. We present a heterogeneous tightly-coupled clustered architecture integrating 8 RISC-V cores, an in-memory computing accelerator (IMA), and digital accelerators. We benchmark the system on a highly heterogeneous workload such as the Bottleneck layer from a MobileNetV2, showing 11.5x performance and 9.5x energy efficiency improvements, compared to highly optimized parallel execution on the cores. Furthermore, we explore the requirements for end-to-end inference of a full mobile-grade DNN (MobileNetV2) in terms of IMC array resources, by scaling up our heterogeneous architecture to a multi-array accelerator. Our results show that our solution, on the end-to-end inference of the MobileNetV2, is one order of magnitude better in terms of execution latency than existing programmable architectures and two orders of magnitude better than state-of-the-art heterogeneous solutions integrating in-memory computing analog cores.

READ FULL TEXT

page 1

page 4

page 6

page 7

page 11

page 12

page 15

research
09/03/2021

End-to-end 100-TOPS/W Inference With Analog In-Memory Computing: Are We There Yet?

In-Memory Acceleration (IMA) promises major efficiency improvements in d...
research
11/23/2022

End-to-End DNN Inference on a Massively Parallel Analog In Memory Computing Architecture

The demand for computation resources and energy efficiency of Convolutio...
research
01/29/2019

PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference

Memristor crossbars are circuits capable of performing analog matrix-vec...
research
12/04/2017

NEURAghe: Exploiting CPU-FPGA Synergies for Efficient and Flexible CNN Inference Acceleration on Zynq SoCs

Deep convolutional neural networks (CNNs) obtain outstanding results in ...
research
06/08/2019

5 Parallel Prism: A topology for pipelined implementations of convolutional neural networks using computational memory

In-memory computing is an emerging computing paradigm that could enable ...
research
06/25/2022

Heterogeneous Multi-core Array-based DNN Accelerator

In this article, we investigate the impact of architectural parameters o...
research
09/20/2023

Containing Analog Data Deluge at Edge through Frequency-Domain Compression in Collaborative Compute-in-Memory Networks

Edge computing is a promising solution for handling high-dimensional, mu...

Please sign up or login with your details

Forgot password? Click here to reset