Hierarchical Roofline Performance Analysis for Deep Learning Applications

09/11/2020
by   Charlene Yang, et al.
0

This paper presents a practical methodology for collecting performance data necessary to conduct hierarchical Roofline analysis on NVIDIA GPUs. It discusses the extension of the Empirical Roofline Toolkit for broader support of a range of data precisions and Tensor Core support and introduces a Nsight Compute based method to accurately collect application performance information. This methodology allows for automated machine characterization and application characterization for Roofline analysis across the entire memory hierarchy on NVIDIA GPUs, and it is validated by a complex deep learning application used for climate image segmentation. We use two versions of the code, in TensorFlow and PyTorch respectively, to demonstrate the use and effectiveness of this methodology. We highlight how the application utilizes the compute and memory capabilities on the GPU and how the implementation and performance differ in two deep learning frameworks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/05/2020

Hierarchical Roofline Analysis: How to Collect Data using Performance Tools on Intel CPUs and NVIDIA GPUs

This paper surveys a range of methods to collect necessary performance d...
research
09/09/2020

Time-Based Roofline for Deep Learning Performance Analysis

Deep learning applications are usually very compute-intensive and requir...
research
07/11/2019

Profiling based Out-of-core Hybrid Method for Large Neural Networks

GPUs are widely used to accelerate deep learning with NNs (NNs). On the ...
research
12/21/2017

Wolf in Sheep's Clothing - The Downscaling Attack Against Deep Learning Applications

This paper considers security risks buried in the data processing pipeli...
research
11/13/2018

FusionStitching: Deep Fusion and Code Generation for Tensorflow Computations on GPUs

In recent years, there is a surge on machine learning applications in in...
research
01/01/2023

MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUs

New architecture GPUs like A100 are now equipped with multi-instance GPU...
research
10/03/2018

Exascale Deep Learning for Climate Analytics

We extract pixel-level masks of extreme weather patterns using variants ...

Please sign up or login with your details

Forgot password? Click here to reset