FUSE: Fusing STT-MRAM into GPUs to Alleviate Off-Chip Memory Access Overheads

03/05/2019
by   Jie Zhang, et al.
0

In this work, we propose FUSE, a novel GPU cache system that integrates spin-transfer torque magnetic random-access memory (STT-MRAM) into the on-chip L1D cache. FUSE can minimize the number of outgoing memory accesses over the interconnection network of GPU's multiprocessors, which in turn can considerably improve the level of massive computing parallelism in GPUs. Specifically, FUSE predicts a read-level of GPU memory accesses by extracting GPU runtime information and places write-once-read-multiple (WORM) data blocks into the STT-MRAM, while accommodating write-multiple data blocks over a small portion of SRAM in the L1D cache. To further reduce the off-chip memory accesses, FUSE also allows WORM data blocks to be allocated anywhere in the STT-MRAM by approximating the associativity with the limited number of tag comparators and I/O peripherals. Our evaluation results show that, in comparison to a traditional GPU cache, our proposed heterogeneous cache reduces the number of outgoing memory references by 32 network, thereby improving the overall performance by 217 cost by 53

READ FULL TEXT

page 1

page 4

page 5

page 9

page 10

research
07/27/2016

Read-Tuned STT-RAM and eDRAM Cache Hierarchies for Throughput and Energy Enhancement

As capacity and complexity of on-chip cache memory hierarchy increases, ...
research
07/28/2021

Reuse Cache for Heterogeneous CPU-GPU Systems

It is generally observed that the fraction of live lines in shared last-...
research
06/11/2021

Bandwidth-Optimal Random Shuffling for GPUs

Linear-time algorithms that are traditionally used to shuffle data on CP...
research
02/19/2019

Efficient Memory Management for GPU-based Deep Learning Systems

GPU (graphics processing unit) has been used for many data-intensive app...
research
05/20/2018

CIAO: Cache Interference-Aware Throughput-Oriented Architecture and Scheduling for GPUs

A modern GPU aims to simultaneously execute more warps for higher Thread...
research
09/02/2019

Touché: Towards Ideal and Efficient Cache Compression By Mitigating Tag Area Overheads

Compression is seen as a simple technique to increase the effective cach...
research
03/25/2021

Reducing Solid-State Drive Read Latency by Optimizing Read-Retry

3D NAND flash memory with advanced multi-level cell techniques provides ...

Please sign up or login with your details

Forgot password? Click here to reset