Micro BTB: A High Performance and Lightweight Last-Level Branch Target Buffer for Servers

06/08/2021
by   Vishal Gupta, et al.
0

High-performance branch target buffers (BTBs) and the L1I cache are key to high-performance front-end. Modern branch predictors are highly accurate, but with an increase in code footprint in modern-day server workloads, BTB and L1I misses are still frequent. Recent industry trend shows usage of large BTBs (100s of KB per core) that provide performance closer to the ideal BTB along with a decoupled front-end that provides efficient fetch-directed L1I instruction prefetching. On the other hand, techniques proposed by academia, like BTB prefetching and using retire order stream for learning, fail to provide significant performance with modern-day processor cores that are deeper and wider. We solve the problem fundamentally by increasing the storage density of the last-level BTB. We observe that not all branch instructions require a full branch target address. Instead, we can store the branch target as a branch offset, relative to the branch instruction. Using branch offset enables the BTB to store multiple branches per entry. We reduce the BTB storage in half, but we observe that it increases skewness in the BTB. We propose a skewed indexed and compressed last-level BTB design called MicroBTB (MBTB) that stores multiple branches per BTB entry. We evaluate MBTB on 100 industry-provided server workloads. A 4K-entry MBTB provides 17.61 an 8K-entry baseline BTB design with a storage savings of 47.5KB per core.

READ FULL TEXT
research
06/24/2020

Fetch-Directed Instruction Prefetching Revisited

Prior work has observed that fetch-directed prefetching (FDIP) is highly...
research
01/10/2023

A Storage-Effective BTB Organization for Servers

Many contemporary applications feature multi-megabyte instruction footpr...
research
08/13/2017

Sensitivity Analysis of Core Specialization Techniques

The instruction footprint of OS-intensive workloads such as web servers,...
research
07/14/2017

Variable Instruction Fetch Rate to Reduce Control Dependent Penalties

In order to overcome the branch execution penalties of hard-to-predict i...
research
10/18/2021

Branch Predicting with Sparse Distributed Memories

Modern processors rely heavily on speculation to keep the pipeline fille...
research
02/10/2022

Learning Branch Probabilities in Compiler from Datacenter Workloads

Estimating the probability with which a conditional branch instruction i...
research
07/28/2022

Identifying and Exploiting Sparse Branch Correlations for Optimizing Branch Prediction

Branch prediction is arguably one of the most important speculative mech...

Please sign up or login with your details

Forgot password? Click here to reset