Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels

10/01/2019
by   Jan Laukemann, et al.
0

Useful models of loop kernel runtimes on out-of-order architectures require an analysis of the the in-core performance behavior of instructions and their dependencies. While an instruction throughput prediction sets a lower bound to the kernel runtime, the critical path defines an upper bound. Such predictions are an essential part of analytic (i.e., white-box) performance models like the Roofline and ECM model. They enable a better understanding of the performance-relevant interactions between hardware architecture and loop code. The Open Source Architecture Code Analyzer (OSACA) is a static analysis tool for predicting the execution time of sequential loops. It previously supported only x86 (Intel and AMD) architectures and simple, optimistic full-throughput execution. We have heavily extended OSACA to support ARM instructions and critical path prediction including the detection of loop-carried dependencies, which turns it into a versatile cross-architecture modeling tool. We show runtime predictions for code on Intel Cascade Lake, AMD Zen, and Marvell Vulcan micro-architectures based on machine models from available documentation and semi-automatic benchmarking. The predictions are compared with actual measurements.

READ FULL TEXT

page 1

page 3

research
09/04/2018

Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures

An accurate prediction of scheduling and execution of instruction stream...
research
07/29/2021

uiCA: Accurate Throughput Prediction of Basic Blocks on Recent Intel Microarchitectures

Performance models that statically predict the steady-state throughput o...
research
12/21/2020

From micro-OPs to abstract resources: constructing a simpler CPU performance model through microbenchmarking

In a super-scalar architecture, the scheduler dynamically assigns micro-...
research
10/10/2018

uops.info: Characterizing Latency, Throughput, and Port Usage of Instructions on Intel Microarchitectures

Modern microarchitectures are some of the world's most complex man-made ...
research
07/01/2019

Bridging the Architecture Gap: Abstracting Performance-Relevant Properties of Modern Server Processors

We describe a universal modeling approach for predicting single- and mul...
research
01/03/2023

Autovesk: Automatic vectorization of unstructured static kernels by graph transformations

Leveraging the SIMD capability of modern CPU architectures is mandatory ...
research
01/13/2017

Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels

Achieving optimal program performance requires deep insight into the int...

Please sign up or login with your details

Forgot password? Click here to reset