Squeeze: Efficient Compact Fractals for Tensor Core GPUs

01/03/2022
by   Felipe A. Quezada, et al.
0

This work presents Squeeze, an efficient compact fractal processing scheme for tensor core GPUs. By combining discrete-space transformations between compact and expanded forms, one can do data-parallel computation on a fractal with neighborhood access without needing to expand the fractal in memory. The space transformations are formulated as two GPU tensor-core accelerated thread maps, λ(ω) and ν(ω), which act as compact-to-expanded and expanded-to-compact space functions, respectively. The cost of the maps is 𝒪(log_2 log_s(n)) time, with n being the side of a n × n embedding for the fractal in its expanded form, and s the linear scaling factor. The proposed approach works for any fractal that belongs to the Non-overlapping-Bounding-Boxes (NBB) class of discrete fractals, and can be extended to three dimensions as well. Experimental results using a discrete Sierpinski Triangle as a case study shows up to ∼12× of speedup and a memory reduction factor of up to ∼ 315× with respect to a GPU-based expanded-space bounding box approach. These results show that the proposed compact approach will allow the scientific community to efficiently tackle problems that up to now could not fit into GPU memory.

READ FULL TEXT
research
10/25/2021

Accelerating Compact Fractals with Tensor Core GPUs

This work presents a GPU thread mapping approach that allows doing fast ...
research
04/25/2020

Efficient GPU Thread Mapping on Embedded 2D Fractals

This work proposes a new approach for mapping GPU threads onto a family ...
research
05/23/2022

Fast GPU bounding boxes on tree-structured scenes

Computation of bounding boxes is a fundamental problem in high performan...
research
01/15/2020

GPU Tensor Cores for fast Arithmetic Reductions

This work proposes a GPU tensor core approach that encodes the arithmeti...
research
03/08/2019

Analyzing GPU Tensor Core Potential for Fast Reductions

The Nvidia GPU architecture has introduced new computing elements such a...
research
05/27/2020

Optimization of Tensor-product Operations in Nekbone on GPUs

In the CFD solver Nek5000, the computation is dominated by the evaluatio...
research
04/14/2022

cu_FastTucker: A Faster and Stabler Stochastic Optimization for Parallel Sparse Tucker Decomposition on Multi-GPUs

High-Order, High-Dimension, and Sparse Tensor (HOHDST) data originates f...

Please sign up or login with your details

Forgot password? Click here to reset