Tearing Down the Memory Wall

08/24/2020
by   Zaid Qureshi, et al.
0

We present a vision for the Erudite architecture that redefines the compute and memory abstractions such that memory bandwidth and capacity become first-class citizens along with compute throughput. In this architecture, we envision coupling a high-density, massively parallel memory technology like Flash with programmable near-data accelerators, like the streaming multiprocessors in modern GPUs. Each accelerator has a local pool of storage-class memory that it can access at high throughput by initiating very large numbers of overlapping requests that help to tolerate long access latency. The accelerators can also communicate with each other and remote memory through a high-throughput low-latency interconnect. As a result, systems based on the Erudite architecture scale compute and memory bandwidth at the same rate, tearing down the notorious memory wall that has plagued computer architecture for generations. In this paper, we present the motivation, rationale, design, benefit, and research challenges for Erudite.

READ FULL TEXT

page 3

page 5

research
06/16/2020

ZnG: Architecting GPU Multi-Processors with New Flash for Scalable Data Analysis

We propose ZnG, a new GPU-SSD integrated architecture, which can maximiz...
research
07/07/2023

CODAG: Characterizing and Optimizing Decompression Algorithms for GPUs

Data compression and decompression have become vital components of big-d...
research
09/13/2022

A Many-ported and Shared Memory Architecture for High-Performance ADAS SoCs

Increasing investment in computing technologies and the advancements in ...
research
09/03/2021

SMART: A Heterogeneous Scratchpad Memory Architecture for Superconductor SFQ-based Systolic CNN Accelerators

Ultra-fast & low-power superconductor single-flux-quantum (SFQ)-based CN...
research
09/07/2023

CXLMemUring: A Hardware Software Co-design Paradigm for Asynchronous and Flexible Parallel CXL Memory Pool Access

CXL has been the emerging technology for expanding memory for both the h...
research
04/27/2021

Realtime Mobile Bandwidth and Handoff Predictions in 4G/5G Networks

Mobile apps are increasingly relying on high-throughput and low-latency ...
research
12/21/2021

Maxwell: a hardware and software highly integrated compute-storage system

The compute-storage framework is responsible for data storage and proces...

Please sign up or login with your details

Forgot password? Click here to reset