A Storage-Effective BTB Organization for Servers

01/10/2023
by   Truls Asheim, et al.
0

Many contemporary applications feature multi-megabyte instruction footprints that overwhelm the capacity of branch target buffers (BTB) and instruction caches (L1-I), causing frequent front-end stalls that inevitably hurt performance. BTB capacity is crucial for performance as a sufficiently large BTB enables the front-end to accurately resolve the upcoming execution path and steer instruction fetch appropriately. Moreover, it also enables highly effective fetch-directed instruction prefetching that can eliminate a large portion L1-I misses. For these reasons, commercial processors allocate vast amounts of storage capacity to BTBs. This work aims to reduce BTB storage requirements by optimizing the organization of BTB entries. Our key insight is that storing branch target offsets, instead of full or compressed targets, can drastically reduce BTB storage cost as the vast majority of dynamic branches have short offsets requiring just a handful of bits to encode. Based on this insight, we size the ways of a set associative BTB to hold different number of target offset bits such that each way stores offsets within a particular range. Doing so enables a dramatic reduction in storage for target addresses. Our final design, called BTB-X, uses an 8-way set associative BTB with differently sized ways that enables it to track about 2.24x more branches than a conventional BTB and 1.3x more branches than a storage-optimized state-of-the-art BTB organization, called PDede, with the same storage budget.

READ FULL TEXT
research
06/24/2020

Fetch-Directed Instruction Prefetching Revisited

Prior work has observed that fetch-directed prefetching (FDIP) is highly...
research
02/02/2021

MANA: Microarchitecting an Instruction Prefetcher

L1 instruction (L1-I) cache misses are a source of performance bottlenec...
research
06/08/2021

Micro BTB: A High Performance and Lightweight Last-Level Branch Target Buffer for Servers

High-performance branch target buffers (BTBs) and the L1I cache are key ...
research
07/14/2017

Variable Instruction Fetch Rate to Reduce Control Dependent Penalties

In order to overcome the branch execution penalties of hard-to-predict i...
research
05/15/2023

By-Software Branch Prediction in Loops

Load-Dependent Branches (LDB) often do not exhibit regular patterns in t...
research
06/29/2020

SeMPE: Secure Multi Path Execution Architecture for Removing Conditional Branch Side Channels

One of the most prevalent source of side channel vulnerabilities is the ...
research
01/15/2022

Calipers: A Criticality-aware Framework for Modeling Processor Performance

Computer architecture design space is vast and complex. Tools are needed...

Please sign up or login with your details

Forgot password? Click here to reset