Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures

09/04/2018
by   Jan Laukemann, et al.
0

An accurate prediction of scheduling and execution of instruction streams is a necessary prerequisite for predicting the in-core performance behavior of throughput-bound loop kernels on out-of-order processor architectures. Such predictions are an indispensable component of analytical performance models, such as the Roofline and the Execution-Cache-Memory (ECM) model, and allow a deep understanding of the performance-relevant interactions between hardware architecture and loop code. We present the Open Source Architecture Code Analyzer (OSACA), a static analysis tool for predicting the execution time of sequential loops comprising x86 instructions under the assumption of an infinite first-level cache and perfect out-of-order scheduling. We show the process of building a machine model from available documentation and semi-automatic benchmarking, and carry it out for the latest Intel Skylake and AMD Zen micro-architectures. To validate the constructed models, we apply them to several assembly kernels and compare runtime predictions with actual measurements. Finally we give an outlook on how the method may be generalized to new architectures.

READ FULL TEXT
research
10/01/2019

Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels

Useful models of loop kernel runtimes on out-of-order architectures requ...
research
01/13/2017

Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels

Achieving optimal program performance requires deep insight into the int...
research
12/21/2020

From micro-OPs to abstract resources: constructing a simpler CPU performance model through microbenchmarking

In a super-scalar architecture, the scheduler dynamically assigns micro-...
research
07/01/2019

Bridging the Architecture Gap: Abstracting Performance-Relevant Properties of Modern Server Processors

We describe a universal modeling approach for predicting single- and mul...
research
10/31/2018

OpenCL Performance Prediction using Architecture-Independent Features

OpenCL is an attractive model for heterogeneous high-performance computi...
research
11/08/2019

nanoBench: A Low-Overhead Tool for Running Microbenchmarks on x86 Systems

We present nanoBench, a tool for evaluating small microbenchmarks using ...
research
04/21/2020

PMEvo: Portable Inference of Port Mappings for Out-of-Order Processors by Evolutionary Optimization

Achieving peak performance in a computer system requires optimizations i...

Please sign up or login with your details

Forgot password? Click here to reset