Simple DRAM and Virtual Memory Abstractions to Enable Highly Efficient Memory Systems

by   Vivek Seshadri, et al.

In most modern systems, the memory subsystem is managed and accessed at multiple different granularities at various resources. We observe that such multi-granularity management results in significant inefficiency in the memory subsystem. Specifically, we observe that 1) page-granularity virtual memory unnecessarily triggers large memory operations, and 2) existing cache-line granularity memory interface is inefficient for performing bulk data operations and operations that exhibit poor spatial locality. To address these problems, we present a series of techniques in this thesis. First, we propose page overlays, a framework augments the existing virtual memory framework with the ability to track a new version of a subset of cache lines within each virtual page. We show that this extension is powerful by demonstrating its benefits on a number of applications. Second, we show that DRAM can be used to perform more complex operations than just store data. We propose RowClone, a mechanism to perform bulk data copy and initialization completely inside DRAM, and Buddy RAM, a mechanism to perform bulk bitwise operations using DRAM. Both these techniques achieve an order-of-magnitude improvement in the efficiency of the respective operations. Third, we propose Gather-Scatter DRAM, a technique that exploits DRAM organization to effectively gather/scatter values with a power-of-2 strided access patterns. For these access patterns, GS-DRAM achieves near-ideal bandwidth and cache utilization, without increasing the latency of fetching data from memory. Finally, we propose the Dirty-Block Index, a new way of tracking dirty blocks. In addition to improving the efficiency of bulk data coherence, DBI has several applications including high-performance memory scheduling, efficient cache lookup bypassing, and enabling heterogeneous ECC.


page 1

page 2

page 3

page 4


RowClone: Accelerating Data Movement and Initialization Using DRAM

In existing systems, to perform any bulk data movement operation (copy o...

TransforMAP: Transformer for Memory Access Prediction

Data Prefetching is a technique that can hide memory latency by fetching...

An Off-Chip Attack on Hardware Enclaves via the Memory Bus

This paper shows how an attacker can break the confidentiality of a hard...

EXMA: A Genomics Accelerator for Exact-Matching

Genomics is the foundation of precision medicine, global food security a...

The Granularity Gap Problem: A Hurdle for Applying Approximate Memory to Complex Data Layout

The main memory access latency has not much improved for more than two d...

Scalable and Configurable Tracking for Any Rowhammer Threshold

The Rowhammer vulnerability continues to get worse, with the Rowhammer T...

A Theory of I/O-Efficient Sparse Neural Network Inference

As the accuracy of machine learning models increases at a fast rate, so ...

Please sign up or login with your details

Forgot password? Click here to reset