Parallel Multi Channel Convolution using General Matrix Multiplication

04/06/2017
by   Aravind Vasudevan, et al.
0

Convolutional neural networks (CNNs) have emerged as one of the most successful machine learning technologies for image and video processing. The most computationally intensive parts of CNNs are the convolutional layers, which convolve multi-channel images with multiple kernels. A common approach to implementing convolutional layers is to expand the image into a column matrix (im2col) and perform Multiple Channel Multiple Kernel (MCMK) convolution using an existing parallel General Matrix Multiplication (GEMM) library. This im2col conversion greatly increases the memory footprint of the input matrix and reduces data locality. In this paper we propose a new approach to MCMK convolution that is based on General Matrix Multiplication (GEMM), but not on im2col. Our algorithm eliminates the need for data replication on the input thereby enabling us to apply the convolution kernels on the input images directly. We have implemented several variants of our algorithm on a CPU processor and an embedded ARM processor. On the CPU, our algorithm is faster than im2col in most cases.

READ FULL TEXT

page 3

page 4

research
04/01/2020

Efficient Implementation of Multi-Channel Convolution in Monolithic 3D ReRAM Crossbar

Convolutional neural networks (CNNs) demonstrate promising accuracy in a...
research
08/16/2018

Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures

Convolution layers are prevalent in many classes of deep neural networks...
research
07/11/2023

MG3MConv: Multi-Grained Matrix-Multiplication-Mapping Convolution Algorithm toward the SW26010 Processor

As the core of artificial intelligence applications, the research of con...
research
09/08/2022

Kernel-Segregated Transpose Convolution Operation

Transpose convolution has shown prominence in many deep learning applica...
research
04/07/2021

A matrix math facility for Power ISA(TM) processors

Power ISA(TM) Version 3.1 has introduced a new family of matrix math ins...
research
12/15/2018

Layer Based Partition for Matrix Multiplication on Heterogeneous Processor Platforms

While many approaches have been proposed to analyze the problem of matri...
research
03/19/2017

CLTune: A Generic Auto-Tuner for OpenCL Kernels

This work presents CLTune, an auto-tuner for OpenCL kernels. It evaluate...

Please sign up or login with your details

Forgot password? Click here to reset