Simple DRAM and Virtual Memory Abstractions to Enable Highly Efficient Memory Systems

05/20/2016
by   Vivek Seshadri, et al.
0

In most modern systems, the memory subsystem is managed and accessed at multiple different granularities at various resources. We observe that such multi-granularity management results in significant inefficiency in the memory subsystem. Specifically, we observe that 1) page-granularity virtual memory unnecessarily triggers large memory operations, and 2) existing cache-line granularity memory interface is inefficient for performing bulk data operations and operations that exhibit poor spatial locality. To address these problems, we present a series of techniques in this thesis. First, we propose page overlays, a framework augments the existing virtual memory framework with the ability to track a new version of a subset of cache lines within each virtual page. We show that this extension is powerful by demonstrating its benefits on a number of applications. Second, we show that DRAM can be used to perform more complex operations than just store data. We propose RowClone, a mechanism to perform bulk data copy and initialization completely inside DRAM, and Buddy RAM, a mechanism to perform bulk bitwise operations using DRAM. Both these techniques achieve an order-of-magnitude improvement in the efficiency of the respective operations. Third, we propose Gather-Scatter DRAM, a technique that exploits DRAM organization to effectively gather/scatter values with a power-of-2 strided access patterns. For these access patterns, GS-DRAM achieves near-ideal bandwidth and cache utilization, without increasing the latency of fetching data from memory. Finally, we propose the Dirty-Block Index, a new way of tracking dirty blocks. In addition to improving the efficiency of bulk data coherence, DBI has several applications including high-performance memory scheduling, efficient cache lookup bypassing, and enabling heterogeneous ECC.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/07/2018

RowClone: Accelerating Data Movement and Initialization Using DRAM

In existing systems, to perform any bulk data movement operation (copy o...
research
05/29/2022

TransforMAP: Transformer for Memory Access Prediction

Data Prefetching is a technique that can hide memory latency by fetching...
research
12/03/2019

An Off-Chip Attack on Hardware Enclaves via the Memory Bus

This paper shows how an attacker can break the confidentiality of a hard...
research
01/13/2021

EXMA: A Genomics Accelerator for Exact-Matching

Genomics is the foundation of precision medicine, global food security a...
research
01/26/2021

The Granularity Gap Problem: A Hurdle for Applying Approximate Memory to Complex Data Layout

The main memory access latency has not much improved for more than two d...
research
08/28/2023

Scalable and Configurable Tracking for Any Rowhammer Threshold

The Rowhammer vulnerability continues to get worse, with the Rowhammer T...
research
01/03/2023

A Theory of I/O-Efficient Sparse Neural Network Inference

As the accuracy of machine learning models increases at a fast rate, so ...

Please sign up or login with your details

Forgot password? Click here to reset