Analyzing Machine Learning Workloads Using a Detailed GPU Simulator

11/18/2018
by   Jonathan Lew, et al.
0

Most deep neural networks deployed today are trained using GPUs via high-level frameworks such as TensorFlow and PyTorch. This paper describes changes we made to the GPGPU-Sim simulator to enable it to run PyTorch by running PTX kernels included in NVIDIA's cuDNN library. We use the resulting modified simulator, which we plan to make available publicly with this paper, to study some simple deep learning workloads. With our changes to GPGPU-Sim's functional simulation model, we find GPGPU-Sim performance model running a cuDNN enabled implementation of LeNet for MNIST reports results within 30 real hardware. Using GPGPU-Sim's AerialVision performance analysis tool we observe that cuDNN API calls contain many varying phases and appear to include potentially inefficient microarchitecture behaviour such as DRAM partition bank camping, at least when executed on GPGPU-Sim's current performance model.

READ FULL TEXT

page 6

page 7

page 8

page 9

research
11/19/2018

Modeling Deep Learning Accelerator Enabled GPUs

The efficacy of deep learning has resulted in its use in a growing numbe...
research
11/16/2017

Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs

Deep learning frameworks have been widely deployed on GPU servers for de...
research
09/13/2022

Deep Learning Training on Multi-Instance GPUs

Deep learning training is an expensive process that extensively uses GPU...
research
10/01/2021

Characterizing Concurrency Mechanisms for NVIDIA GPUs under Deep Learning Workloads

We investigate the performance of the concurrency mechanisms available o...
research
05/21/2019

Performance Analysis of Deep Learning Workloads on Leading-edge Systems

This work examines the performance of leading-edge systems designed for ...
research
08/27/2021

Using Graph Neural Networks to model the performance of Deep Neural Networks

With the unprecedented proliferation of machine learning software, there...
research
01/04/2018

An Implementation of Back-Propagation Learning on GF11, a Large SIMD Parallel Computer

Current connectionist simulations require huge computational resources. ...

Please sign up or login with your details

Forgot password? Click here to reset