-
Hierarchical Roofline Performance Analysis for Deep Learning Applications
This paper presents a practical methodology for collecting performance d...
read it
-
Time-Based Roofline for Deep Learning Performance Analysis
Deep learning applications are usually very compute-intensive and requir...
read it
-
Hierarchical Roofline Analysis: How to Collect Data using Performance Tools on Intel CPUs and NVIDIA GPUs
This paper surveys a range of methods to collect necessary performance d...
read it
-
8 Steps to 3.7 TFLOP/s on NVIDIA V100 GPU: Roofline Analysis and Other Tricks
Performance optimization can be a daunting task especially as the hardwa...
read it

Charlene Yang
is this you? claim profile