DeepAI AI Chat
Log In Sign Up

Analyzing Machine Learning Workloads Using a Detailed GPU Simulator

by   Jonathan Lew, et al.

Most deep neural networks deployed today are trained using GPUs via high-level frameworks such as TensorFlow and PyTorch. This paper describes changes we made to the GPGPU-Sim simulator to enable it to run PyTorch by running PTX kernels included in NVIDIA's cuDNN library. We use the resulting modified simulator, which we plan to make available publicly with this paper, to study some simple deep learning workloads. With our changes to GPGPU-Sim's functional simulation model, we find GPGPU-Sim performance model running a cuDNN enabled implementation of LeNet for MNIST reports results within 30 real hardware. Using GPGPU-Sim's AerialVision performance analysis tool we observe that cuDNN API calls contain many varying phases and appear to include potentially inefficient microarchitecture behaviour such as DRAM partition bank camping, at least when executed on GPGPU-Sim's current performance model.


page 6

page 7

page 8

page 9


Modeling Deep Learning Accelerator Enabled GPUs

The efficacy of deep learning has resulted in its use in a growing numbe...

Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs

Deep learning frameworks have been widely deployed on GPU servers for de...

Characterizing Concurrency Mechanisms for NVIDIA GPUs under Deep Learning Workloads

We investigate the performance of the concurrency mechanisms available o...

Performance Analysis of Deep Learning Workloads on Leading-edge Systems

This work examines the performance of leading-edge systems designed for ...

RL-Scope: Cross-Stack Profiling for Deep Reinforcement Learning Workloads

Deep reinforcement learning (RL) has made groundbreaking advancements in...

Using Graph Neural Networks to model the performance of Deep Neural Networks

With the unprecedented proliferation of machine learning software, there...

Exploring Modern GPU Memory System Design Challenges through Accurate Modeling

This paper explores the impact of simulator accuracy on architecture des...