Automated Parallel Kernel Extraction from Dynamic Application Traces

01/27/2020
by   Richard Uhrie, et al.
0

Modern program runtime is dominated by segments of repeating code called kernels. Kernels are accelerated by increasing memory locality, increasing data-parallelism, and exploiting producer-consumer parallelism among kernels - which requires hardware specialized for a particular class of kernels. Programming this hardware can be difficult, requiring that the kernels be identified and annotated in the code or translated to a domain-specific language. This paper describes a technique to automatically localize parallel kernels from a dynamic application trace, facilitating further code optimization. Dynamic trace collection is fast and compact. With optimization, it only incurs a time-dilation of a factor on nine and file-size of one megabyte per second, addressing a significant criticism of this approach. Kernel extraction is accurate and performed in linear time within logarithmic memory, detecting a wide range of kernels. This approach was validated across 16 libraries, comprised of 10,507 kernels instances. To validate the accuracy of our detected kernels, five test programs were written that spans traditional kernel definitions and were certified to contain all the kernels that were expected.

READ FULL TEXT

page 5

page 9

page 10

page 11

page 14

research
11/20/2012

Multicore Dynamic Kernel Modules Attachment Technique for Kernel Performance Enhancement

Traditional monolithic kernels dominated kernel structures for long time...
research
08/30/2020

Performance portability through machine learning guided kernel selection in SYCL libraries

Automatically tuning parallel compute kernels allows libraries and frame...
research
07/01/2022

High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs

While parallelism remains the main source of performance, architectural ...
research
10/05/2020

Performance Analysis of Traditional and Data-Parallel Primitive Implementations of Visualization and Analysis Kernels

Measurements of absolute runtime are useful as a summary of performance ...
research
03/16/2019

MultiK: A Framework for Orchestrating Multiple Specialized Kernels

We present, MultiK, a Linux-based framework 1 that reduces the attack su...
research
10/18/2019

A Benchmark Set of Highly-efficient CUDA and OpenCL Kernels and its Dynamic Autotuning with Kernel Tuning Toolkit

Autotuning of performance-relevant source-code parameters allows to auto...
research
11/05/2019

KLARAPTOR: A Tool for Dynamically Finding Optimal Kernel Launch Parameters Targeting CUDA Programs

In this paper we present KLARAPTOR (Kernel LAunch parameters RAtional Pr...

Please sign up or login with your details

Forgot password? Click here to reset