Bridging the Architecture Gap: Abstracting Performance-Relevant Properties of Modern Server Processors

07/01/2019
by   Johannes Hofmann, et al.
0

We describe a universal modeling approach for predicting single- and multicore runtime of steady-state loops on server processors. To this end we strictly differentiate between application and machine models: An application model comprises the loop code, problem sizes, and other runtime parameters, while a machine model is an abstraction of all performance-relevant properties of a CPU. We introduce a generic method for determining machine models and present results for relevant server-processor architectures by Intel, AMD, IBM, and Marvell/Cavium. Considering this wide range of architectures, the set of features required for adequate performance modeling is surprisingly small. To validate our approach, we compare performance predictions to empirical data for an OpenMP-parallel preconditioned CG algorithm, which includes compute- and memory-bound kernels. Both single- and multicore analysis shows that the model exhibits average and maximum relative errors of 5 model and insights gained are discussed in detail.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/04/2018

Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures

An accurate prediction of scheduling and execution of instruction stream...
research
05/28/2019

Energy Efficiency Features of the Intel Skylake-SP Processor and Their Impact on Performance

The overwhelming majority of High Performance Computing (HPC) systems an...
research
10/01/2019

Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels

Useful models of loop kernel runtimes on out-of-order architectures requ...
research
02/09/2020

Understanding HPC Benchmark Performance on Intel Broadwell and Cascade Lake Processors

Hardware platforms in high performance computing are constantly getting ...
research
08/16/2018

Novel Model-based Methods for Performance Optimization of Multithreaded 2D Discrete Fourier Transform on Multicore Processors

In this paper, we use multithreaded fast Fourier transforms provided in ...
research
01/13/2017

Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels

Achieving optimal program performance requires deep insight into the int...
research
01/13/2023

PMFault: Faulting and Bricking Server CPUs through Management Interfaces

Apart from the actual CPU, modern server motherboards contain other auxi...

Please sign up or login with your details

Forgot password? Click here to reset