Fetch-Directed Instruction Prefetching Revisited

06/24/2020
by   Truls Asheim, et al.
0

Prior work has observed that fetch-directed prefetching (FDIP) is highly effective at covering instruction cache misses. The key to FDIP's effectiveness is having a sufficiently large BTB to accommodate the application's branch working set. In this work, we introduce several optimizations that significantly extend the reach of the BTB within the available storage budget. Our optimizations target nearly every source of storage overhead in each BTB entry; namely, the tag, target address, and size fields. We observe that while most dynamic branch instances have short offsets, a large number of branches has longer offsets or requires the use of full target addresses. Based on this insight, we break-up the BTB into multiple smaller BTBs, each storing offsets of different length. This enables a dramatic reduction in storage for target addresses. We further compress tags to 16 bits and avoid the use of the basic-block-oriented BTB advocated in prior FDIP variants. The latter optimization eliminates the need to store the basic block size in each BTB entry. Our final design, called FDIP-X, uses an ensemble of 4 BTBs and always outperforms conventional FDIP with a unified basic-block-oriented BTB for equal storage budgets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/10/2023

A Storage-Effective BTB Organization for Servers

Many contemporary applications feature multi-megabyte instruction footpr...
research
06/08/2021

Micro BTB: A High Performance and Lightweight Last-Level Branch Target Buffer for Servers

High-performance branch target buffers (BTBs) and the L1I cache are key ...
research
02/02/2021

MANA: Microarchitecting an Instruction Prefetcher

L1 instruction (L1-I) cache misses are a source of performance bottlenec...
research
05/15/2023

By-Software Branch Prediction in Loops

Load-Dependent Branches (LDB) often do not exhibit regular patterns in t...
research
01/13/2021

Distributed storage algorithms with optimal tradeoffs

One of the primary objectives of a distributed storage system is to reli...
research
09/02/2019

Touché: Towards Ideal and Efficient Cache Compression By Mitigating Tag Area Overheads

Compression is seen as a simple technique to increase the effective cach...
research
02/26/2023

Large-Block Modular Addition Checksum Algorithms

Checksum algorithms are widely employed due to their use of a simple alg...

Please sign up or login with your details

Forgot password? Click here to reset