cuConv: A CUDA Implementation of Convolution for CNN Inference

03/30/2021
by   Marc Jordà, et al.
0

Convolutions are the core operation of deep learning applications based on Convolutional Neural Networks (CNNs). Current GPU architectures are highly efficient for training and deploying deep CNNs, and hence, these are largely used in production for this purpose. State-of-the-art implementations, however, present a lack of efficiency for some commonly used network configurations. In this paper we propose a GPU-based implementation of the convolution operation for CNN inference that favors coalesced accesses, without requiring prior data transformations. Our experiments demonstrate that our proposal yields notable performance improvements in a range of common CNN forward propagation convolution configurations, with speedups of up to 2.29x with respect to the best implementation of convolution in cuDNN, hence covering a relevant region in currently existing approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2019

DeLTA: GPU Performance Model for Deep Learning Applications with In-depth Memory System Traffic Analysis

Training convolutional neural networks (CNNs) requires intense compute t...
research
12/24/2014

Fast Convolutional Nets With fbfft: A GPU Performance Evaluation

We examine the performance profile of Convolutional Neural Network train...
research
01/27/2015

maxDNN: An Efficient Convolution Kernel for Deep Learning with Maxwell GPUs

This paper describes maxDNN, a computationally efficient convolution ker...
research
05/04/2018

Performance tuning for deep learning on a many-core processor (master thesis)

Convolutional neural networks (CNNs) are becoming very successful and po...
research
11/25/2020

Deep Convolutional Neural Networks: A survey of the foundations, selected improvements, and some current applications

Within the world of machine learning there exists a wide range of differ...
research
12/17/2019

Mitigate Parasitic Resistance in Resistive Crossbar-based Convolutional Neural Networks

Traditional computing hardware often encounters on-chip memory bottlenec...
research
12/09/2022

Towards a learning-based performance modeling for accelerating Deep Neural Networks

Emerging applications such as Deep Learning are often data-driven, thus ...

Please sign up or login with your details

Forgot password? Click here to reset