DSPatch: Dual Spatial Pattern Prefetcher

10/07/2019
by   Rahul Bera, et al.
0

High main memory latency continues to limit performance of modern high-performance out-of-order cores. While DRAM latency has remained nearly the same over many generations, DRAM bandwidth has grown significantly due to higher frequencies, newer architectures (DDR4, LPDDR4, GDDR5) and 3D-stacked memory packaging (HBM). Current state-of-the-art prefetchers do not do well in extracting higher performance when higher DRAM bandwidth is available. Prefetchers need the ability to dynamically adapt to available bandwidth, boosting prefetch count and prefetch coverage when headroom exists and throttling down to achieve high accuracy when the bandwidth utilization is close to peak. To this end, we present the Dual Spatial Pattern Prefetcher (DSPatch) that can be used as a standalone prefetcher or as a lightweight adjunct spatial prefetcher to the state-of-the-art delta-based Signature Pattern Prefetcher (SPP). DSPatch builds on a novel and intuitive use of modulated spatial bit-patterns. The key idea is to: (1) represent program accesses on a physical page as a bit-pattern anchored to the first "trigger" access, (2) learn two spatial access bit-patterns: one biased towards coverage and another biased towards accuracy, and (3) select one bit-pattern at run-time based on the DRAM bandwidth utilization to generate prefetches. Across a diverse set of workloads, using only 3.6KB of storage, DSPatch improves performance over an aggressive baseline with a PC-based stride prefetcher at the L1 cache and the SPP prefetcher at the L2 cache by 6 memory-intensive workloads and up to 26 DSPatch+SPP scales with increasing DRAM bandwidth, growing from 6 10

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2017

Banshee: Bandwidth-Efficient DRAM Caching Via Software/Hardware Cooperation

Putting the DRAM on the same package with a processor enables several ti...
research
01/20/2018

Design Guidelines for High-Performance SCM Hierarchies

With emerging storage-class memory (SCM) nearing commercialization, ther...
research
08/19/2021

Monarch: A Durable Polymorphic Memory For Data Intensive Applications

3D die stacking has often been proposed to build large-scale DRAM-based ...
research
10/26/2016

Memshare: a Dynamic Multi-tenant Memory Key-value Cache

Web application performance is heavily reliant on the hit rate of memory...
research
09/28/2022

Unveiling the Real Performance of LPDDR5 Memories

LPDDR5 is the latest low-power DRAM standard and expected to be used in ...
research
08/01/2018

MARS: Memory Aware Reordered Source

Memory bandwidth is critical in today's high performance computing syste...
research
05/08/2023

Cheshire: A Lightweight, Linux-Capable RISC-V Host Platform for Domain-Specific Accelerator Plug-In

Power and cost constraints in the internet-of-things (IoT) extreme-edge ...

Please sign up or login with your details

Forgot password? Click here to reset