Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud

09/19/2022
by   Geraldo F. Oliveira, et al.
33

Neural networks (NNs) are growing in importance and complexity. A neural network's performance (and energy efficiency) can be bound either by computation or memory resources. The processing-in-memory (PIM) paradigm, where computation is placed near or within memory arrays, is a viable solution to accelerate memory-bound NNs. However, PIM architectures vary in form, where different PIM approaches lead to different trade-offs. Our goal is to analyze, discuss, and contrast DRAM-based PIM architectures for NN performance and energy efficiency. To do so, we analyze three state-of-the-art PIM architectures: (1) UPMEM, which integrates processors and DRAM arrays into a single 2D chip; (2) Mensa, a 3D-stack-based PIM architecture tailored for edge devices; and (3) SIMDRAM, which uses the analog principles of DRAM to execute bit-serial operations. Our analysis reveals that PIM greatly benefits memory-bound NNs: (1) UPMEM provides 23x the performance of a high-end GPU when the GPU requires memory oversubscription for a general matrix-vector multiplication kernel; (2) Mensa improves energy efficiency and throughput by 3.0x and 3.1x over the Google Edge TPU for 24 Google edge NN models; and (3) SIMDRAM outperforms a CPU/GPU by 16.7x/1.4x for three binary NNs. We conclude that the ideal PIM architecture for NN models depends on a model's distinct attributes, due to the inherent architectural design choices.

READ FULL TEXT

page 1

page 12

research
04/11/2019

Accelerating Bulk Bit-Wise X(N)OR Operation in Processing-in-DRAM Platform

With Von-Neumann computing architectures struggling to address computati...
research
03/01/2021

Mitigating Edge Machine Learning Inference Bottlenecks: An Empirical Study on Accelerating Google Edge Models

As the need for edge computing grows, many modern consumer devices now c...
research
11/10/2022

NEON: Enabling Efficient Support for Nonlinear Operations in Resistive RAM-based Neural Network Accelerators

Resistive Random-Access Memory (RRAM) is well-suited to accelerate neura...
research
04/16/2017

In-Datacenter Performance Analysis of a Tensor Processing Unit

Many architects believe that major improvements in cost-energy-performan...
research
10/22/2020

Position-Agnostic Multi-Microphone Speech Dereverberation

Neural networks (NNs) have been widely applied in speech processing task...
research
10/17/2021

Exploring Deep Neural Networks on Edge TPU

This paper explores the performance of Google's Edge TPU on feed forward...
research
09/02/2021

NVMExplorer: A Framework for Cross-Stack Comparisons of Embedded Non-Volatile Memories

Repeated off-chip memory accesses to DRAM drive up operating power for d...

Please sign up or login with your details

Forgot password? Click here to reset