Measurement and Analysis of GPU-accelerated Applications with HPCToolkit

09/14/2021
by   Keren Zhou, et al.
0

To address the challenge of performance analysis on the US DOE's forthcoming exascale supercomputers, Rice University has been extending its HPCToolkit performance tools to support measurement and analysis of GPU-accelerated applications. To help developers understand the performance of accelerated applications as a whole, HPCToolkit's measurement and analysis tools attribute metrics to calling contexts that span both CPUs and GPUs. To measure GPU-accelerated applications efficiently, HPCToolkit employs a novel wait-free data structure to coordinate monitoring and attribution of GPU performance. To help developers understand the performance of complex GPU code generated from high-level programming models, HPCToolkit constructs sophisticated approximations of call path profiles for GPU computations. To support fine-grained analysis and tuning, HPCToolkit uses PC sampling and instrumentation to measure and attribute GPU performance metrics to source lines, loops, and inlined code. To supplement fine-grained measurements, HPCToolkit can measure GPU kernel executions using hardware performance counters. To provide a view of how an execution evolves over time, HPCToolkit can collect, analyze, and visualize call path traces within and across nodes. Finally, on NVIDIA GPUs, HPCToolkit can derive and attribute a collection of useful performance metrics based on measurements using GPU PC samples. We illustrate HPCToolkit's new capabilities for analyzing GPU-accelerated applications with several codes developed as part of the Exascale Computing Project.

READ FULL TEXT

page 1

page 3

page 4

page 6

page 9

page 10

page 11

page 14

research
08/09/2021

Preparing for Performance Analysis at Exascale

Performance tools for emerging heterogeneous exascale platforms must add...
research
09/09/2020

GPA: A GPU Performance Advisor Based on Instruction Sampling

Developing efficient GPU kernels can be difficult because of the complex...
research
01/26/2021

C-for-Metal: High Performance SIMD Programming on Intel GPUs

The SIMT execution model is commonly used for general GPU development. C...
research
08/16/2021

On the performance of GPU accelerated q-LSKUM based meshfree solvers in Fortran, C++, Python, and Julia

This report presents a comprehensive analysis of the performance of GPU ...
research
11/25/2020

Enabling GPU Accelerated Computing in the SUNDIALS Time Integration Library

As part of the Exascale Computing Project (ECP), a recent focus of devel...
research
09/13/2021

Specifying and Testing GPU Workgroup Progress Models

As GPU availability has increased and programming support has matured, a...
research
07/13/2018

Tools for Analyzing Parallel I/O

Parallel application I/O performance often does not meet user expectatio...

Please sign up or login with your details

Forgot password? Click here to reset