Kernel-Segregated Transpose Convolution Operation

09/08/2022
by   Vijay Srinivas Tida, et al.
0

Transpose convolution has shown prominence in many deep learning applications. However, transpose convolution layers are computationally intensive due to the increased feature map size due to adding zeros after each element in each row and column. Thus, convolution operation on the expanded input feature map leads to poor utilization of hardware resources. The main reason for unnecessary multiplication operations is zeros at predefined positions in the input feature map. We propose an algorithmic-level optimization technique for the effective transpose convolution implementation to solve these problems. Based on kernel activations, we segregated the original kernel into four sub-kernels. This scheme could reduce memory requirements and unnecessary multiplications. Our proposed method was 3.09 (3.02) × faster computation using the Titan X GPU (Intel Dual Core CPU) with a flower dataset from the Kaggle website. Furthermore, the proposed optimization method can be generalized to existing devices without additional hardware requirements. A simple deep learning model containing one transpose convolution layer was used to evaluate the optimization method. It showed 2.2 × faster training using the MNIST dataset with an Intel Dual-core CPU than the conventional implementation.

READ FULL TEXT

page 2

page 3

page 5

research
09/25/2021

NUMA-aware FFT-based Convolution on ARMv8 Many-core CPUs

Convolutional Neural Networks (CNNs), one of the most representative alg...
research
02/28/2017

Enabling Sparse Winograd Convolution by Native Pruning

Sparse methods and the use of Winograd convolutions are two orthogonal a...
research
04/06/2017

Parallel Multi Channel Convolution using General Matrix Multiplication

Convolutional neural networks (CNNs) have emerged as one of the most suc...
research
01/27/2015

maxDNN: An Efficient Convolution Kernel for Deep Learning with Maxwell GPUs

This paper describes maxDNN, a computationally efficient convolution ker...
research
02/12/2018

GPU implementation of algorithm SIMPLE-TS for calculation of unsteady, viscous, compressible and heat-conductive gas flows

The recent trend of using Graphics Processing Units (GPU's) for high per...
research
04/28/2015

Improving Block-level Efficiency with scsi-mq

Current generation solid-state storage devices are exposing a new bottle...
research
04/16/2021

Efficient and Generic 1D Dilated Convolution Layer for Deep Learning

Convolutional neural networks (CNNs) have found many applications in tas...

Please sign up or login with your details

Forgot password? Click here to reset