DeLTA: GPU Performance Model for Deep Learning Applications with In-depth Memory System Traffic Analysis

04/02/2019
by   Sangkug Lym, et al.
0

Training convolutional neural networks (CNNs) requires intense compute throughput and high memory bandwidth. Especially, convolution layers account for the majority of the execution time of CNN training, and GPUs are commonly used to accelerate these layer workloads. GPU design optimization for efficient CNN training acceleration requires the accurate modeling of how their performance improves when computing and memory resources are increased. We present DeLTA, the first analytical model that accurately estimates the traffic at each GPU memory hierarchy level, while accounting for the complex reuse patterns of a parallel convolution algorithm. We demonstrate that our model is both accurate and robust for different CNNs and GPU architectures. We then show how this model can be used to carefully balance the scaling of different GPU resources for efficient CNN performance improvement.

READ FULL TEXT

page 1

page 4

page 6

page 7

page 10

page 11

page 12

page 13

research
10/12/2016

Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs

Leveraging large data sets, deep Convolutional Neural Networks (CNNs) ac...
research
03/30/2021

cuConv: A CUDA Implementation of Convolution for CNN Inference

Convolutions are the core operation of deep learning applications based ...
research
12/01/2022

Fast convolution kernels on pascal GPU with high memory efficiency

The convolution computation is widely used in many fields, especially in...
research
09/11/2021

A readahead prefetcher for GPU file system layer

GPUs are broadly used in I/O-intensive big data applications. Prior work...
research
06/18/2018

Partitioning Compute Units in CNN Acceleration for Statistical Memory Traffic Shaping

The design complexity of CNNs has been steadily increasing to improve ac...
research
09/03/2022

Ridgeline: A 2D Roofline Model for Distributed Systems

In this short paper, we introduce the Ridgeline model, an extension of t...
research
06/15/2021

S2Engine: A Novel Systolic Architecture for Sparse Convolutional Neural Networks

Convolutional neural networks (CNNs) have achieved great success in perf...

Please sign up or login with your details

Forgot password? Click here to reset