I/O Lower Bounds for Auto-tuning of Convolutions in CNNs

12/31/2020
by   Xiaoyang Zhang, et al.
10

Convolution is the most time-consuming part in the computation of convolutional neural networks (CNNs), which have achieved great successes in numerous applications. Due to the complex data dependency and the increase in the amount of model samples, the convolution suffers from high overhead on data movement (i.e., memory access). This work provides comprehensive analysis and methodologies to minimize the communication for the convolution in CNNs. With an in-depth analysis of the recent I/O complexity theory under the red-blue game model, we develop a general I/O lower bound theory for a composite algorithm which consists of several different sub-computations. Based on the proposed theory, we establish the data movement lower bound results of two representative convolution algorithms in CNNs, namely the direct convolution and Winograd algorithm. Next, derived from I/O lower bound results, we design the near I/O-optimal dataflow strategies for the two main convolution algorithms by fully exploiting the data reuse. Furthermore, in order to push the envelope of performance of the near I/O-optimal dataflow strategies further, an aggressive design of auto-tuning based on I/O lower bounds, is proposed to search an optimal parameter configuration for the direct convolution and Winograd algorithm on GPU, such as the number of threads and the size of shared memory used in each thread block. Finally, experiment evaluation results on the direct convolution and Winograd algorithm show that our dataflow strategies with the auto-tuning approach can achieve about 3.32x performance speedup on average over cuDNN. In addition, compared with TVM, which represents the state-of-the-art technique for auto-tuning, not only our auto-tuning method based on I/O lower bounds can find the optimal parameter configuration faster, but also our solution has higher performance than the optimal solution provided by TVM.

READ FULL TEXT

page 5

page 6

page 7

page 8

page 9

page 10

page 12

page 17

research
02/19/2018

Communication-Optimal Convolutional Neural Nets

Efficiently executing convolutional neural nets (CNNs) is important in m...
research
06/01/2019

Lower Bounds for Small Ramsey Numbers on Hypergraphs

The Ramsey number r_k(p, q) is the smallest integer N that satisfies for...
research
06/25/2023

Im2win: An Efficient Convolution Paradigm on GPU

Convolution is the most time-consuming operation in deep neural network ...
research
04/18/2022

Communication Bounds for Convolutional Neural Networks

Convolutional neural networks (CNNs) are important in a wide variety of ...
research
07/09/2019

New Competitiveness Bounds for the Shared Memory Switch

We consider one of the simplest and best known buffer management archite...
research
01/24/2021

Analytical Characterization and Design Space Exploration for Optimization of CNNs

Moving data through the memory hierarchy is a fundamental bottleneck tha...
research
03/08/2019

Stronger Lower Bounds for Online ORAM

Oblivious RAM (ORAM), introduced in the context of software protection b...

Please sign up or login with your details

Forgot password? Click here to reset