uops.info: Characterizing Latency, Throughput, and Port Usage of Instructions on Intel Microarchitectures

10/10/2018
by   Andreas Abel, et al.
0

Modern microarchitectures are some of the world's most complex man-made systems. As a consequence, it is increasingly difficult to predict, explain, let alone optimize the performance of software running on such microarchitectures. As a basis for performance predictions and optimizations, we would need faithful models of their behavior, which are, unfortunately, seldomly available. In this paper, we present the design and implementation of a tool to construct faithful models of the latency, throughput, and port usage of x86 instructions. To this end, we first discuss common notions of instruction throughput and port usage, and introduce a more precise definition of latency that, in contrast to previous definitions, considers dependencies between different pairs of input and output operands. We then develop novel algorithms to infer the latency, throughput, and port usage based on automatically-generated microbenchmarks that are more accurate and precise than existing work. To facilitate the rapid construction of optimizing compilers and tools for performance prediction, the output of our tool is provided in a machine-readable format. We provide experimental results for processors of all generations of Intel's Core architecture, i.e., from Nehalem to Coffee Lake, and discuss various cases where the output of our tool differs considerably from prior work.

READ FULL TEXT
research
11/08/2019

nanoBench: A Low-Overhead Tool for Running Microbenchmarks on x86 Systems

We present nanoBench, a tool for evaluating small microbenchmarks using ...
research
02/22/2021

Reading from External Memory

Modern external memory is represented by several device classes. At pres...
research
10/01/2019

Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels

Useful models of loop kernel runtimes on out-of-order architectures requ...
research
07/29/2021

uiCA: Accurate Throughput Prediction of Basic Blocks on Recent Intel Microarchitectures

Performance models that statically predict the steady-state throughput o...
research
11/01/2022

Optimization of Oblivious Decision Tree Ensembles Evaluation for CPU

CatBoost is a popular machine learning library. CatBoost models are base...
research
09/13/2022

AnICA: Analyzing Inconsistencies in Microarchitectural Code Analyzers

Microarchitectural code analyzers, i.e., tools that estimate the through...
research
12/21/2020

From micro-OPs to abstract resources: constructing a simpler CPU performance model through microbenchmarking

In a super-scalar architecture, the scheduler dynamically assigns micro-...

Please sign up or login with your details

Forgot password? Click here to reset