Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture

05/09/2021
by   Juan Gomez-Luna, et al.
0

Many modern workloads, such as neural networks, databases, and graph processing, are fundamentally memory-bound. For such workloads, the data movement between main memory and CPU cores imposes a significant overhead in terms of both latency and energy. A major reason is that this communication happens through a narrow bus with high latency and limited bandwidth, and the low data reuse in memory-bound workloads is insufficient to amortize the cost of main memory access. Fundamentally addressing this data movement bottleneck requires a paradigm where the memory system assumes an active role in computing by integrating processing capabilities. This paradigm is known as processing-in-memory (PIM). Recent research explores different forms of PIM architectures, motivated by the emergence of new 3D-stacked memory technologies that integrate memory with a logic layer where processing elements can be easily placed. Past works evaluate these architectures in simulation or, at best, with simplified hardware prototypes. In contrast, the UPMEM company has designed and manufactured the first publicly-available real-world PIM architecture. This paper provides the first comprehensive analysis of the first publicly-available real-world PIM architecture. We make two key contributions. First, we conduct an experimental characterization of the UPMEM-based PIM system using microbenchmarks to assess various architecture limits such as compute throughput and memory bandwidth, yielding new insights. Second, we present PrIM, a benchmark suite of 16 workloads from different application domains (e.g., linear algebra, databases, graph processing, neural networks, bioinformatics).

READ FULL TEXT

page 14

page 22

page 31

page 32

page 35

page 36

page 37

page 38

research
10/04/2021

Benchmarking Memory-Centric Computing Systems: Analysis of Real Processing-in-Memory Hardware

Many modern workloads such as neural network inference and graph process...
research
06/03/2022

A Co-design view of Compute in-Memory with Non-Volatile Elements for Neural Networks

Deep Learning neural networks are pervasive, but traditional computer ar...
research
04/07/2019

BriskStream: Scaling Data Stream Processing on Shared-Memory Multicore Architectures

We introduce BriskStream, an in-memory data stream processing system (DS...
research
04/02/2022

Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems

Several manufacturers have already started to commercialize near-bank Pr...
research
06/27/2023

Retrospective: A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing

Our ISCA 2015 paper provides a new programmable processing-in-memory (PI...
research
04/03/2023

TransPimLib: A Library for Efficient Transcendental Functions on Processing-in-Memory Systems

Processing-in-memory (PIM) promises to alleviate the data movement bottl...
research
07/16/2022

An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System

Training machine learning (ML) algorithms is a computationally intensive...

Please sign up or login with your details

Forgot password? Click here to reset