Low-memory GEMM-based convolution algorithms for deep neural networks

09/08/2017
by   Andrew Anderson, et al.
0

Deep neural networks (DNNs) require very large amounts of computation both for training and for inference when deployed in the field. A common approach to implementing DNNs is to recast the most computationally expensive operations as general matrix multiplication (GEMM). However, as we demonstrate in this paper, there are a great many different ways to express DNN convolution operations using GEMM. Although different approaches all perform the same number of operations, the size of temporary data structures differs significantly. Convolution of an input matrix with dimensions C × H × W, requires O(K^2CHW) additional space using the classical im2col approach. More recently memory-efficient approaches requiring just O(KCHW) auxiliary space have been proposed. We present two novel GEMM-based algorithms that require just O(MHW) and O(KW) additional space respectively, where M is the number of channels in the result of the convolution. These algorithms dramatically reduce the space overhead of DNN convolution, making it much more suitable for memory-limited embedded systems. Experimental evaluation shows that our low-memory algorithms are just as fast as the best patch-building approaches despite requiring just a fraction of the amount of additional memory. Our low-memory algorithms have excellent data locality which gives them a further edge over patch-building algorithms when multiple cores are used. As a result, our low memory algorithms often outperform the best patch-building algorithms using multiple threads.

READ FULL TEXT

page 6

page 10

page 12

page 13

research
03/29/2018

Error Analysis and Improving the Accuracy of Winograd Convolution for Deep Neural Networks

Modern deep neural networks (DNNs) spend a large amount of their executi...
research
01/29/2022

A Novel Matrix-Encoding Method for Privacy-Preserving Neural Networks (Inference)

In this work, we present , a novel matrix-encoding method that is partic...
research
10/03/2017

Optimal DNN Primitive Selection with Partitioned Boolean Quadratic Programming

Deep Neural Networks (DNNs) require very large amounts of computation bo...
research
03/29/2018

Improving accuracy of Winograd convolution for DNNs

Modern deep neural networks (DNNs) spend a large amount of their executi...
research
06/21/2017

MEC: Memory-efficient Convolution for Deep Neural Network

Convolution is a critical component in modern deep neural networks, thus...
research
06/25/2023

Im2win: Memory Efficient Convolution On SIMD Architectures

Convolution is the most expensive operation among neural network operati...
research
10/01/2019

NGEMM: Optimizing GEMM for Deep Learning via Compiler-based Techniques

Quantization has emerged to be an effective way to significantly boost t...

Please sign up or login with your details

Forgot password? Click here to reset