Characterizing and Demystifying the Implicit Convolution Algorithm on Commercial Matrix-Multiplication Accelerators

10/08/2021
by   Yangjie Zhou, et al.
0

Many of today's deep neural network accelerators, e.g., Google's TPU and NVIDIA's tensor core, are built around accelerating the general matrix multiplication (i.e., GEMM). However, supporting convolution on GEMM-based accelerators is not trivial. The naive method explicitly lowers the convolution to GEMM, commonly known as im2col, which introduces significant performance and memory overhead. Existing implicit im2col algorithms require unscalable hardware and are inefficient in supporting important convolution variants such as strided convolution. In this paper, we propose a memory-efficient and hardware-friendly implicit im2col algorithm used by Google's TPU, which dynamically converts a convolution into a GEMM with practically zero performance and memory overhead, fully unleashing the power of GEMM engines. Through comprehensive experimental results, we quantitatively argue that this algorithm has been adopted in commercial closed-source platforms, and we are the first to describe its high-level idea and implementation details. Finally, we show that our algorithm can also be generally applied to Nvidia's Tensor Cores (TC), matching and out-performing the measured performance on TCs.

READ FULL TEXT

page 2

page 3

page 4

page 5

page 7

research
08/19/2019

A Computational Model for Tensor Core Units

To respond to the need of efficient training and inference of deep neura...
research
12/14/2021

TCUDB: Accelerating Database with Tensor Processors

The emergence of novel hardware accelerators has powered the tremendous ...
research
02/11/2022

Blocking Techniques for Sparse Matrix Multiplication on Tensor Accelerators

Tensor accelerators have gained popularity because they provide a cheap ...
research
06/05/2020

High-level Modeling of Manufacturing Faults in Deep Neural Network Accelerators

The advent of data-driven real-time applications requires the implementa...
research
06/25/2023

Im2win: Memory Efficient Convolution On SIMD Architectures

Convolution is the most expensive operation among neural network operati...
research
06/21/2017

MEC: Memory-efficient Convolution for Deep Neural Network

Convolution is a critical component in modern deep neural networks, thus...
research
07/11/2023

MG3MConv: Multi-Grained Matrix-Multiplication-Mapping Convolution Algorithm toward the SW26010 Processor

As the core of artificial intelligence applications, the research of con...

Please sign up or login with your details

Forgot password? Click here to reset