Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels

01/13/2017
by   Julian Hammer, et al.
0

Achieving optimal program performance requires deep insight into the interaction between hardware and software. For software developers without an in-depth background in computer architecture, understanding and fully utilizing modern architectures is close to impossible. Analytic loop performance modeling is a useful way to understand the relevant bottlenecks of code execution based on simple machine models. The Roofline Model and the Execution-Cache-Memory (ECM) model are proven approaches to performance modeling of loop nests. In comparison to the Roofline model, the ECM model can also describes the single-core performance and saturation behavior on a multicore chip. We give an introduction to the Roofline and ECM models, and to stencil performance modeling using layer conditions (LC). We then present Kerncraft, a tool that can automatically construct Roofline and ECM models for loop nests by performing the required code, data transfer, and LC analysis. The layer condition analysis allows to predict optimal spatial blocking factors for loop nests. Together with the models it enables an ab-initio estimate of the potential benefits of loop blocking optimizations and of useful block sizes. In cases where LC analysis is not easily possible, Kerncraft supports a cache simulator as a fallback option. Using a 25-point long-range stencil we demonstrate the usefulness and predictive power of the Kerncraft tool.

READ FULL TEXT

page 7

page 18

research
09/04/2018

Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures

An accurate prediction of scheduling and execution of instruction stream...
research
02/01/2018

PCOT: Cache Oblivious Tiling of Polyhedral Programs

This paper studies two variants of tiling: iteration space tiling (or lo...
research
01/16/2019

Analytic Performance Modeling and Analysis of Detailed Neuron Simulations

Big science initiatives are trying to reconstruct and model the brain by...
research
10/01/2019

Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels

Useful models of loop kernel runtimes on out-of-order architectures requ...
research
10/31/2020

An analytic performance model for overlapping execution of memory-bound loop kernels on multicore CPUs

Complex applications running on multicore processors show a rich perform...
research
12/23/2020

Software Pipelining for Quantum Loop Programs

We propose a method for performing software pipelining on quantum for-lo...
research
07/01/2019

Bridging the Architecture Gap: Abstracting Performance-Relevant Properties of Modern Server Processors

We describe a universal modeling approach for predicting single- and mul...

Please sign up or login with your details

Forgot password? Click here to reset