-
Hierarchical Roofline Analysis: How to Collect Data using Performance Tools on Intel CPUs and NVIDIA GPUs
This paper surveys a range of methods to collect necessary performance d...
read it
-
Time-Based Roofline for Deep Learning Performance Analysis
Deep learning applications are usually very compute-intensive and requir...
read it
-
Profiling based Out-of-core Hybrid Method for Large Neural Networks
GPUs are widely used to accelerate deep learning with NNs (NNs). On the ...
read it
-
Wolf in Sheep's Clothing - The Downscaling Attack Against Deep Learning Applications
This paper considers security risks buried in the data processing pipeli...
read it
-
FusionStitching: Deep Fusion and Code Generation for Tensorflow Computations on GPUs
In recent years, there is a surge on machine learning applications in in...
read it
-
swTVM: Exploring the Automated Compilation for Deep Learning on Sunway Architecture
The flourish of deep learning frameworks and hardware platforms has been...
read it
-
Exascale Deep Learning for Climate Analytics
We extract pixel-level masks of extreme weather patterns using variants ...
read it
Hierarchical Roofline Performance Analysis for Deep Learning Applications
This paper presents a practical methodology for collecting performance data necessary to conduct hierarchical Roofline analysis on NVIDIA GPUs. It discusses the extension of the Empirical Roofline Toolkit for broader support of a range of data precisions and Tensor Core support and introduces a Nsight Compute based method to accurately collect application performance information. This methodology allows for automated machine characterization and application characterization for Roofline analysis across the entire memory hierarchy on NVIDIA GPUs, and it is validated by a complex deep learning application used for climate image segmentation. We use two versions of the code, in TensorFlow and PyTorch respectively, to demonstrate the use and effectiveness of this methodology. We highlight how the application utilizes the compute and memory capabilities on the GPU and how the implementation and performance differ in two deep learning frameworks.
READ FULL TEXT
Comments
There are no comments yet.