DeepAI AI Chat
Log In Sign Up

Measurement and Analysis of GPU-accelerated Applications with HPCToolkit

by   Keren Zhou, et al.

To address the challenge of performance analysis on the US DOE's forthcoming exascale supercomputers, Rice University has been extending its HPCToolkit performance tools to support measurement and analysis of GPU-accelerated applications. To help developers understand the performance of accelerated applications as a whole, HPCToolkit's measurement and analysis tools attribute metrics to calling contexts that span both CPUs and GPUs. To measure GPU-accelerated applications efficiently, HPCToolkit employs a novel wait-free data structure to coordinate monitoring and attribution of GPU performance. To help developers understand the performance of complex GPU code generated from high-level programming models, HPCToolkit constructs sophisticated approximations of call path profiles for GPU computations. To support fine-grained analysis and tuning, HPCToolkit uses PC sampling and instrumentation to measure and attribute GPU performance metrics to source lines, loops, and inlined code. To supplement fine-grained measurements, HPCToolkit can measure GPU kernel executions using hardware performance counters. To provide a view of how an execution evolves over time, HPCToolkit can collect, analyze, and visualize call path traces within and across nodes. Finally, on NVIDIA GPUs, HPCToolkit can derive and attribute a collection of useful performance metrics based on measurements using GPU PC samples. We illustrate HPCToolkit's new capabilities for analyzing GPU-accelerated applications with several codes developed as part of the Exascale Computing Project.


page 1

page 3

page 4

page 6

page 9

page 10

page 11

page 14


Preparing for Performance Analysis at Exascale

Performance tools for emerging heterogeneous exascale platforms must add...

GPA: A GPU Performance Advisor Based on Instruction Sampling

Developing efficient GPU kernels can be difficult because of the complex...

C-for-Metal: High Performance SIMD Programming on Intel GPUs

The SIMT execution model is commonly used for general GPU development. C...

Daisen: A Framework for Visualizing Detailed GPU Execution

Graphics Processing Units (GPUs) have been widely used to accelerate art...

On the performance of GPU accelerated q-LSKUM based meshfree solvers in Fortran, C++, Python, and Julia

This report presents a comprehensive analysis of the performance of GPU ...

Enabling GPU Accelerated Computing in the SUNDIALS Time Integration Library

As part of the Exascale Computing Project (ECP), a recent focus of devel...

Specifying and Testing GPU Workgroup Progress Models

As GPU availability has increased and programming support has matured, a...