Accelerating Distributed-Memory Autotuning via Statistical Analysis of Execution Paths

03/01/2021
by   Edward Hutter, et al.
0

The prohibitive expense of automatic performance tuning at scale has largely limited the use of autotuning to libraries for shared-memory and GPU architectures. We introduce a framework for approximate autotuning that achieves a desired confidence in each algorithm configuration's performance by constructing confidence intervals to describe the performance of individual kernels (subroutines of benchmarked programs). Once a kernel's performance is deemed sufficiently predictable for a set of inputs, subsequent invocations are avoided and replaced with a predictive model of the execution time. We then leverage online execution path analysis to coordinate selective kernel execution and propagate each kernel's statistical profile. This strategy is effective in the presence of frequently-recurring computation and communication kernels, which is characteristic to algorithms in numerical linear algebra. We encapsulate this framework as part of a new profiling tool, Critter, that automates kernel execution decisions and propagates statistical profiles along critical paths of execution. We evaluate performance prediction accuracy obtained by our selective execution methods using state-of-the-art distributed-memory implementations of Cholesky and QR factorization on Stampede2, and demonstrate speed-ups of up to 7.1x with 98 accuracy.

READ FULL TEXT
research
04/17/2020

GEVO: GPU Code Optimization using EvolutionaryComputation

GPUs are a key enabler of the revolution in machine learning and high pe...
research
04/17/2020

GEVO: GPU Code Optimization using Evolutionary Computation

GPUs are a key enabler of the revolution in machine learning and high pe...
research
07/21/2020

Quantifying Performance Changes with Effect Size Confidence Intervals

Measuring performance quantifying a performance change are core eval...
research
01/19/2017

GPGPU Performance Estimation with Core and Memory Frequency Scaling

Graphics Processing Units (GPUs) support dynamic voltage and frequency s...
research
01/11/2023

Adaptive Data Path Selection for Durable Transaction in GPU Persistent Memory

The new non-volatile memory technology relies on data recoverability to ...
research
04/21/2019

A mechanism for balancing accuracy and scope in cross-machine black-box GPU performance modeling

The ability to model, analyze, and predict execution time of computation...
research
07/05/2022

FLOPs as a Discriminant for Dense Linear Algebra Algorithms

Expressions that involve matrices and vectors, known as linear algebra e...

Please sign up or login with your details

Forgot password? Click here to reset