FIGARO: Improving System Performance via Fine-Grained In-DRAM Data Relocation and Caching

09/17/2020
by   Yaohua Wang, et al.
0

DRAM Main memory is a performance bottleneck for many applications due to the high access latency. In-DRAM caches work to mitigate this latency by augmenting regular-latency DRAM with small-but-fast regions of DRAM that serve as a cache for the data held in the regular-latency region of DRAM. While an effective in-DRAM cache can allow a large fraction of memory requests to be served from a fast DRAM region, the latency savings are often hindered by inefficient mechanisms for relocating copies of data into and out of the fast regions. Existing in-DRAM caches have two sources of inefficiency: (1) the data relocation granularity is an entire multi-kilobyte row of DRAM; and (2) because the relocation latency increases with the physical distance between the slow and fast regions, multiple fast regions are physically interleaved among slow regions to reduce the relocation latency, resulting in increased hardware area and manufacturing complexity. We propose a new substrate, FIGARO, that uses existing shared global buffers among subarrays within a DRAM bank to provide support for in-DRAM data relocation across subarrays at the granularity of a single cache block. FIGARO has a distance-independent latency within a DRAM bank, and avoids complex modifications to DRAM. Using FIGARO, we design a fine-grained in-DRAM cache called FIGCache. The key idea of FIGCache is to cache only small, frequently-accessed portions of different DRAM rows in a designated region of DRAM. By caching only the parts of each row that are expected to be accessed in the near future, we can pack more of the frequently-accessed data into FIGCache, and can benefit from additional row hits in DRAM. Our evaluations show that FIGCache improves the average performance of a system using DDR4 DRAM by 16.3 energy consumption by 7.8 without in-DRAM caching.

READ FULL TEXT

page 9

page 10

page 12

research
07/27/2022

Sectored DRAM: An Energy-Efficient High-Throughput and Practical Fine-Grained DRAM Architecture

There are two major sources of inefficiency in computing systems that us...
research
04/10/2017

Banshee: Bandwidth-Efficient DRAM Caching Via Software/Hardware Cooperation

Putting the DRAM on the same package with a processor enables several ti...
research
05/03/2023

NVMM cache design: Logging vs. Paging

Modern NVMM is closing the gap between DRAM and persistent storage, both...
research
01/26/2021

The Granularity Gap Problem: A Hurdle for Applying Approximate Memory to Complex Data Layout

The main memory access latency has not much improved for more than two d...
research
06/03/2018

Gemini: Reducing DRAM Cache Hit Latency by Hybrid Mappings

Die-stacked DRAM caches are increasingly advocated to bridge the perform...
research
12/21/2017

Improving DRAM Performance by Parallelizing Refreshes with Accesses

Modern DRAM cells are periodically refreshed to prevent data loss due to...
research
03/20/2020

A Migratory Near Memory Processing Architecture Applied to Big Data Problems

Servers produced by mainstream vendors are inefficient in processing Big...

Please sign up or login with your details

Forgot password? Click here to reset