Accelerating Reduction and Scan Using Tensor Core Units

11/24/2018
by   Abdul Dakkak, et al.
0

Driven by deep learning, there has been a surge of specialized processors for matrix multiplication, referred to as TensorCore Units (TCUs). These TCUs are capable of performing matrix multiplications on small matrices (usually 4x4 or 16x16) to accelerate the convolutional and recurrent neural networks in deep learning workloads. In this paper we leverage NVIDIA's TCU to express both reduction and scan with matrix multiplication and show the benefits -- in terms of program simplicity, efficiency, and performance. Our algorithm exercises the NVIDIA TCUs which would otherwise be idle, achieves 89 bandwidth, and is orders of magnitude faster (up to 100x for reduction and 3x for scan) than state-of-the-art methods for small segment sizes -- common in machine learning and scientific applications. Our algorithm achieves this while decreasing the power consumption by up to 22

READ FULL TEXT

page 1

page 5

page 6

page 7

page 10

research
06/21/2023

DGEMM on Integer Matrix Multiplication Unit

Deep learning hardware achieves high throughput and low power consumptio...
research
09/29/2020

Accelerating Sparse Matrix-Matrix Multiplication with GPU Tensor Cores

Sparse general matrix-matrix multiplication (spGEMM) is an essential com...
research
08/19/2019

A Computational Model for Tensor Core Units

To respond to the need of efficient training and inference of deep neura...
research
02/11/2022

Blocking Techniques for Sparse Matrix Multiplication on Tensor Accelerators

Tensor accelerators have gained popularity because they provide a cheap ...
research
10/27/2020

Matrix Engines for High Performance Computing:A Paragon of Performance or Grasping at Straws?

Matrix engines or units, in different forms and affinities, are becoming...
research
06/22/2020

Similarity Search with Tensor Core Units

Tensor Core Units (TCUs) are hardware accelerators developed for deep ne...
research
08/17/2020

Unitary Learning for Deep Diffractive Neural Network

Realization of deep learning with coherent diffraction has achieved rema...

Please sign up or login with your details

Forgot password? Click here to reset