Benchmarking Memory-Centric Computing Systems: Analysis of Real Processing-in-Memory Hardware

10/04/2021
by   Juan Gomez-Luna, et al.
0

Many modern workloads such as neural network inference and graph processing are fundamentally memory-bound. For such workloads, data movement between memory and CPU cores imposes a significant overhead in terms of both latency and energy. A major reason is that this communication happens through a narrow bus with high latency and limited bandwidth, and the low data reuse in memory-bound workloads is insufficient to amortize the cost of memory access. Fundamentally addressing this data movement bottleneck requires a paradigm where the memory system assumes an active role in computing by integrating processing capabilities. This paradigm is known as processing-in-memory (PIM). Recent research explores different forms of PIM architectures, motivated by the emergence of new technologies that integrate memory with a logic layer, where processing elements can be easily placed. Past works evaluate these architectures in simulation or, at best, with simplified hardware prototypes. In contrast, the UPMEM company has designed and manufactured the first publicly-available real-world PIM architecture. The UPMEM PIM architecture combines traditional DRAM memory arrays with general-purpose in-order cores, called DRAM Processing Units (DPUs), integrated in the same chip. This paper presents key takeaways from the first comprehensive analysis of the first publicly-available real-world PIM architecture. We provide four key takeaways about the UPMEM PIM architecture, which stem from our study. More insights about suitability of different workloads to the PIM system, programming recommendations for software designers, and suggestions and hints for hardware and architecture designers of future PIM systems are available in arXiv:2105.03814

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/09/2021

Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture

Many modern workloads, such as neural networks, databases, and graph pro...
research
07/26/2022

Dalorex: A Data-Local Program Execution and Architecture for Memory-bound Applications

Applications with low data reuse and frequent irregular memory accesses,...
research
12/13/2022

ALP: Alleviating CPU-Memory Data Movement Overheads in Memory-Centric Systems

Partitioning applications between NDP and host CPU cores causes inter-se...
research
07/16/2022

An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System

Training machine learning (ML) algorithms is a computationally intensive...
research
04/02/2022

Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems

Several manufacturers have already started to commercialize near-bank Pr...
research
06/27/2023

Retrospective: A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing

Our ISCA 2015 paper provides a new programmable processing-in-memory (PI...
research
03/06/2020

Stretching the capacity of Hardware Transactional Memory in IBM POWER architectures

The hardware transactional memory (HTM) implementations in commercially ...

Please sign up or login with your details

Forgot password? Click here to reset