A Fast and Generic GPU-Based Parallel Reduction Implementation

10/19/2017
by   Walid Jradi, et al.
0

Reduction operations are extensively employed in many computational problems. A reduction consists of, given a finite set of numeric elements, combining into a single value all elements in that set, using for this a combiner function. A parallel reduction, in turn, is the reduction operation concurrently performed when multiple execution units are available. The current work reports an investigation on this subject and depicts a GPU-based parallel approach for it. Employing techniques like Loop Unrolling, Persistent Threads and Algebraic Expressions to avoid thread divergence, the presented approach was able to achieve a 2.8x speedup when compared to the work of Catanzaro, using a generic, simple and easily portable code. Experiments conducted to evaluate the approach show that the strategy is able to perform efficiently in AMD and NVidia's hardware, as well as in OpenCL and CUDA.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/08/2019

Analyzing GPU Tensor Core Potential for Fast Reductions

The Nvidia GPU architecture has introduced new computing elements such a...
research
01/15/2020

GPU Tensor Cores for fast Arithmetic Reductions

This work proposes a GPU tensor core approach that encodes the arithmeti...
research
12/07/2021

Stochastic Optimized Schwarz Methods for the Gravity Equations on Graphics Processing Unit

Low order, sequential or non-massively parallel finite elements are gene...
research
11/22/2016

Reduction-Based Creative Telescoping for Fuchsian D-finite Functions

Continuing a series of articles in the past few years on creative telesc...
research
06/17/2020

A parallel hybrid implementation of the 2D acoustic wave equation

In this paper, we propose a hybrid parallel programming approach for a n...
research
05/30/2020

GPU-based parallel simulations of the Gatenby-Gawlinski model with anisotropic, heterogeneous acid diffusion

We introduce a variant of the Gatenby-Gawlinski model for acid-mediated ...
research
04/25/2020

Efficient GPU Thread Mapping on Embedded 2D Fractals

This work proposes a new approach for mapping GPU threads onto a family ...

Please sign up or login with your details

Forgot password? Click here to reset