Reducing Load Latency with Cache Level Prediction

03/27/2021
by   Majid Jalili, et al.
0

High load latency that results from deep cache hierarchies and relatively slow main memory is an important limiter of single-thread performance. Data prefetch helps reduce this latency by fetching data up the hierarchy before it is requested by load instructions. However, data prefetching has shown to be imperfect in many situations. We propose cache-level prediction to complement prefetchers. Our method predicts which memory hierarchy level a load will access allowing the memory loads to start earlier, and thereby saves many cycles. The predictor provides high prediction accuracy at the cost of just one cycle added latency to L1 misses. Experimental results show speedup of 7.8% on generic, graph, and HPC applications over a baseline with aggressive prefetchers.

READ FULL TEXT

page 9

page 10

research
09/01/2022

Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load Prediction

Long-latency load requests continue to limit the performance of high-per...
research
10/16/2022

RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory

Recent nano-technological advances enable the Monolithic 3D (M3D) integr...
research
02/26/2021

SLAP: A Split Latency Adaptive VLIW pipeline architecture which enables on-the-fly variable SIMD vector-length

Over the last decade the relative latency of access to shared memory by ...
research
05/31/2019

Evaluating Memento Service Optimizations

Services and applications based on the Memento Aggregator can suffer fro...
research
02/22/2021

On Value Recomputation to Accelerate Invisible Speculation

Recent architectural approaches that address speculative side-channel at...
research
12/23/2019

SSR: A Stall Scheme Reducing Bubbles in Load-Use Hazard of RISC-V Pipeline

Modern processors usually adopt pipeline structure and often load data f...
research
07/26/2016

Uber: Utilizing Buffers to Simplify NoCs for Hundreds-Cores

Approaching ideal wire latency using a network-on-chip (NoC) is an impor...

Please sign up or login with your details

Forgot password? Click here to reset