Learning from distinctive candidates to optimize reduced-precision convolution program on tensor cores

02/11/2022
by   Junkyeong Choi, et al.
0

Convolution is one of the fundamental operations of deep neural networks with demanding matrix computation. In a graphic processing unit (GPU), Tensor Core is a specialized matrix processing hardware equipped with reduced-precision matrix-multiply-accumulate (MMA) instructions to increase throughput. However, it is challenging to achieve optimal performance since the best scheduling of MMA instructions varies for different convolution sizes. In particular, reduced-precision MMA requires many elements grouped as a matrix operand, seriously limiting data reuse and imposing packing and layout overhead on the schedule. This work proposes an automatic scheduling method of reduced-precision MMA for convolution operation. In this method, we devise a search space that explores the thread tile and warp sizes to increase the data reuse despite a large matrix operand of reduced-precision MMA. The search space also includes options of register-level packing and layout optimization to lesson overhead of handling reduced-precision data. Finally, we propose a search algorithm to find the best schedule by learning from the distinctive candidates. This reduced-precision MMA optimization method is evaluated on convolution operations of popular neural networks to demonstrate substantial speedup on Tensor Core compared to the state of the arts with shortened search time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2018

NVIDIA Tensor Core Programmability, Performance & Precision

The NVIDIA Volta GPU microarchitecture introduces a specialized unit, ca...
research
09/14/2022

Efficient Quantized Sparse Matrix Operations on Tensor Cores

The exponentially growing model size drives the continued success of dee...
research
01/21/2021

UNIT: Unifying Tensorized Instruction Compilation

Because of the increasing demand for computation in DNN, researchers dev...
research
10/23/2020

Not Half Bad: Exploring Half-Precision in Graph Convolutional Neural Networks

With the growing significance of graphs as an effective representation o...
research
04/10/2021

Joint Program and Layout Transformations to enable Convolutional Operators on Specialized Hardware based on Constraint Programming

The success of Deep Artificial Neural Networks (DNNs) in many domains cr...
research
03/08/2023

Advancing Direct Convolution using Convolution Slicing Optimization and ISA Extensions

Convolution is one of the most computationally intensive operations that...
research
05/26/2019

HadaNets: Flexible Quantization Strategies for Neural Networks

On-board processing elements on UAVs are currently inadequate for traini...

Please sign up or login with your details

Forgot password? Click here to reset