Analyzing GPU Tensor Core Potential for Fast Reductions

03/08/2019
by   Roberto Carrasco, et al.
0

The Nvidia GPU architecture has introduced new computing elements such as the tensor cores, which are special processing units dedicated to perform fast matrix-multiply-accumulate (MMA) operations and accelerate Deep Learning applications. In this work we present the idea of using tensor cores for a different purpose such as the parallel arithmetic reduction problem, and propose a new GPU tensor-core based algorithm as well as analyze its potential performance benefits in comparison to a traditional GPU-based one. The proposed method, encodes the reduction of n numbers as a set of m× m MMA tensor-core operations (for Nvidia's Volta architecture m=16) and takes advantage from the fact that each MMA operation takes just one GPU cycle. When analyzing the cost under a simplified GPU computing model, the result is that the new algorithm manages to reduce a problem of n numbers in T(n) = 5_m^2(n) steps with a speedup of S = 4/5_2(m^2).

READ FULL TEXT
research
01/15/2020

GPU Tensor Cores for fast Arithmetic Reductions

This work proposes a GPU tensor core approach that encodes the arithmeti...
research
10/19/2017

A Fast and Generic GPU-Based Parallel Reduction Implementation

Reduction operations are extensively employed in many computational prob...
research
11/27/2020

High-Throughput Parallel Viterbi Decoder on GPU Tensor Cores

Many research works have been performed on implementation of Vitrerbi de...
research
09/22/2022

Computing Double Precision Euclidean Distances using GPU Tensor Cores

Tensor cores (TCs) are a type of Application-Specific Integrated Circuit...
research
06/28/2020

Parallel Weighted Model Counting with Tensor Networks

A promising new algebraic approach to weighted model counting makes use ...
research
12/31/2018

Batch Size Influence on Performance of Graphic and Tensor Processing Units during Training and Inference Phases

The impact of the maximally possible batch size (for the better runtime)...
research
01/03/2022

Squeeze: Efficient Compact Fractals for Tensor Core GPUs

This work presents Squeeze, an efficient compact fractal processing sche...

Please sign up or login with your details

Forgot password? Click here to reset