MG3MConv: Multi-Grained Matrix-Multiplication-Mapping Convolution Algorithm toward the SW26010 Processor

07/11/2023
by   Zheng Wu, et al.
0

As the core of artificial intelligence applications, the research of convolution has become a hot topic in high performance computing. With the rapid development of the emerging SW26010 processor in artificial intelligence, there is an urgent need for high-performance convolution algorithms on the processor. However, the current support of convolution on SW26010 is still rudimentary. The only studies provide sufficient runtime peak performance but lack the adaptability to various convolution scenes. To perfect convolution algorithms on SW26010, we propose a multi-grained matrix-multiplication-mapping convolution algorithm called MG3MConv, which targets the architectural features of SW26010. MG3MConv supports diversified mapping schemes of convolution tasks based on the concept of the thread block proposed in this paper. All the architecture-oriented optimization methods are elaborately designed from four levels to fully exploit the hardware efficiency of SW26010. The experiments show that the hardware efficiency of MG3MConv can reach 84.78 1.75 times compared with that of cuDNN based on NVIDIA K80m GPU. Moreover, MG3MConv can overperform cuDNN in most convolution scenes. We also use six representative CNNs as real-world cases, and the hardware efficiency of MG3MConv reaches up to 67.04 1.96 times that of cuDNN and swDNN, respectively.

READ FULL TEXT

page 3

page 6

page 8

page 10

page 11

page 12

page 13

research
09/25/2021

NUMA-aware FFT-based Convolution on ARMv8 Many-core CPUs

Convolutional Neural Networks (CNNs), one of the most representative alg...
research
04/06/2017

Parallel Multi Channel Convolution using General Matrix Multiplication

Convolutional neural networks (CNNs) have emerged as one of the most suc...
research
05/13/2020

High Performance and Portable Convolution Operators for ARM-based Multicore Processors

The considerable impact of Convolutional Neural Networks on many Artific...
research
09/26/2019

Appearances of the Birthday Paradox in High Performance Computing

We give an elementary statistical analysis of two High Performance Compu...
research
02/15/2023

Toward matrix multiplication for deep learning inference on the Xilinx Versal

The remarkable positive impact of Deep Neural Networks on many Artificia...
research
10/08/2021

Characterizing and Demystifying the Implicit Convolution Algorithm on Commercial Matrix-Multiplication Accelerators

Many of today's deep neural network accelerators, e.g., Google's TPU and...
research
05/04/2018

Performance tuning for deep learning on a many-core processor (master thesis)

Convolutional neural networks (CNNs) are becoming very successful and po...

Please sign up or login with your details

Forgot password? Click here to reset