MANA: Microarchitecting an Instruction Prefetcher

02/02/2021
by   Ali Ansari, et al.
0

L1 instruction (L1-I) cache misses are a source of performance bottleneck. Sequential prefetchers are simple solutions to mitigate this problem; however, prior work has shown that these prefetchers leave considerable potentials uncovered. This observation has motivated many researchers to come up with more advanced instruction prefetchers. In 2011, Proactive Instruction Fetch (PIF) showed that a hardware prefetcher could effectively eliminate all of the instruction-cache misses. However, its enormous storage cost makes it an impractical solution. Consequently, reducing the storage cost was the main research focus in the instruction prefetching in the past decade. Several instruction prefetchers, including RDIP and Shotgun, were proposed to offer PIF-level performance with significantly lower storage overhead. However, our findings show that there is a considerable performance gap between these proposals and PIF. While these proposals use different mechanisms for instruction prefetching, the performance gap is largely not because of the mechanism, and instead, is due to not having sufficient storage. Prior proposals suffer from one or both of the following shortcomings: (1) a large number of metadata records to cover the potential, and (2) a high storage cost of each record. The first problem causes metadata miss, and the second problem prohibits the prefetcher from storing enough records within reasonably-sized storage.

READ FULL TEXT

page 15

page 16

page 18

research
01/10/2023

A Storage-Effective BTB Organization for Servers

Many contemporary applications feature multi-megabyte instruction footpr...
research
06/24/2020

Fetch-Directed Instruction Prefetching Revisited

Prior work has observed that fetch-directed prefetching (FDIP) is highly...
research
11/18/2022

ACIC: Admission-Controlled Instruction Cache

The front end bottleneck in datacenter workloads has come under increase...
research
11/26/2019

System Performance with varying L1 Instruction and Data Cache Sizes: An Empirical Analysis

In this project, we investigate the fluctuations in performance caused b...
research
08/31/2023

Charliecloud's layer-free, Git-based container build cache

A popular approach to deploying scientific applications in high performa...
research
05/22/2023

Further Decimating the Inductive Programming Search Space with Instruction Digrams

Overlapping instruction subsets derived from human originated code have ...
research
01/24/2019

Accuracy vs. Computational Cost Tradeoff in Distributed Computer System Simulation

Simulation is a fundamental research tool in the computer architecture f...

Please sign up or login with your details

Forgot password? Click here to reset