Reuse Cache for Heterogeneous CPU-GPU Systems

07/28/2021
by   Tejas Shah, et al.
0

It is generally observed that the fraction of live lines in shared last-level caches (SLLC) is very small for chip multiprocessors (CMPs). This can be tackled using promotion-based replacement policies like re-reference interval prediction (RRIP) instead of LRU, dead-block predictors, or reuse-based cache allocation schemes. In GPU systems, similar LLC issues are alleviated using various cache bypassing techniques. These issues are worsened in heterogeneous CPU-GPU systems because the two processors have different data access patterns and frequencies. GPUs generally work on streaming data, but have many more threads accessing memory as compared to CPUs. As such, most traditional cache replacement and allocation policies prove ineffective due to the higher number of cache accesses in GPU applications, resulting in higher allocation for GPU cache lines, despite their minimal reuse. In this work, we implement the Reuse Cache approach for heterogeneous CPU-GPU systems. The reuse cache is a decoupled tag/data SLLC which is designed to only store the data that is being accessed more than once. This design is based on the observation that most of the cache lines in the LLC are stored but do not get reused before being replaced. We find that the reuse cache achieves within 0.5 a statically partitioned LLC, while decreasing the area cost of the LLC by an average of 40

READ FULL TEXT
research
07/04/2019

To Update or Not To Update?: Bandwidth-Efficient Intelligent Replacement Policies for DRAM Caches

This paper investigates intelligent replacement policies for improving t...
research
03/05/2019

FUSE: Fusing STT-MRAM into GPUs to Alleviate Off-Chip Memory Access Overheads

In this work, we propose FUSE, a novel GPU cache system that integrates ...
research
06/15/2020

Addressing Variability in Reuse Prediction for Last-Level Caches

Last-Level Cache (LLC) represents the bulk of a modern CPU processor's t...
research
01/19/2019

A Two-Layer Component-Based Allocation for Embedded Systems with GPUs

Component-based development is a software engineering paradigm that can ...
research
07/23/2020

Observing the Invisible: Live Cache Inspection for High-Performance Embedded Systems

The vast majority of high-performance embedded systems implement multi-l...
research
01/22/2020

Domain-Specialized Cache Management for Graph Analytics

Graph analytics power a range of applications in areas as diverse as fin...
research
09/10/2021

A Fast-and-Effective Early-Stage Multi-level Cache Optimization Method Based on Reuse-Distance Analysis

In this paper, we propose a practical and effective approach allowing de...

Please sign up or login with your details

Forgot password? Click here to reset