DeepAI AI Chat
Log In Sign Up

uiCA: Accurate Throughput Prediction of Basic Blocks on Recent Intel Microarchitectures

by   Andreas Abel, et al.

Performance models that statically predict the steady-state throughput of basic blocks on particular microarchitectures, such as IACA, Ithemal, llvm-mca, OSACA, or CQA, can guide optimizing compilers and aid manual software optimization. However, their utility heavily depends on the accuracy of their predictions. The average error of existing models compared to measurements on the actual hardware has been shown to lie between 9 this? To answer this question, we propose an extremely simple analytical throughput model that may serve as a baseline. Surprisingly, this model is already competitive with the state of the art, indicating that there is significant potential for improvement. To explore this potential, we develop a simulation-based throughput predictor. To this end, we propose a detailed parametric pipeline model that supports all Intel Core microarchitecture generations released between 2011 and 2021. We evaluate our predictor on an improved version of the BHive benchmark suite and show that its predictions are usually within 1 results, improving upon prior models by roughly an order of magnitude. The experimental evaluation also demonstrates that several microarchitectural details considered to be rather insignificant in previous work, are in fact essential for accurate prediction. Our throughput predictor is available as open source at


Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels

Useful models of loop kernel runtimes on out-of-order architectures requ...

GRANITE: A Graph Neural Network Model for Basic Block Throughput Estimation

Analytical hardware performance models yield swift estimation of desired...
10/10/2018 Characterizing Latency, Throughput, and Port Usage of Instructions on Intel Microarchitectures

Modern microarchitectures are some of the world's most complex man-made ...

UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models

Diffusion probabilistic models (DPMs) have demonstrated a very promising...

Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network

Recent studies in image classification have demonstrated a variety of te...

Improved proteasomal cleavage prediction with positive-unlabeled learning

Accurate in silico modeling of the antigen processing pathway is crucial...

Predicting Chroma from Luma in AV1

Chroma from luma (CfL) prediction is a new and promising chroma-only int...