Efficient GPU Implementation of Affine Index Permutations on Arrays

06/13/2023
by   Mathis Bouverot-Dupuis, et al.
0

Optimal usage of the memory system is a key element of fast GPU algorithms. Unfortunately many common algorithms fail in this regard despite exhibiting great regularity in memory access patterns. In this paper we propose efficient kernels to permute the elements of an array. We handle a class of permutations known as Bit Matrix Multiply Complement (BMMC) permutations, for which we design kernels of speed comparable to that of a simple array copy. This is a first step towards implementing a set of array combinators based on these permutations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2018

Sparse Matrix Code Dependence Analysis Simplification at Compile Time

Analyzing array-based computations to determine data dependences is usef...
research
08/31/2022

GGArray: A Dynamically Growable GPU Array

We present a dynamically Growable GPU array (GGArray) fully implemented ...
research
05/31/2019

Reference Capabilities for Safe Parallel Array Programming

The array is a fundamental data structure that provides an efficient way...
research
05/30/2019

Inducing the Lyndon Array

In this paper we propose a variant of the induced suffix sorting algorit...
research
11/13/2019

Compile-time Parallelization of Subscripted Subscript Patterns

An increasing number of scientific applications are making use of irregu...
research
06/08/2023

Longest Common Prefix Arrays for Succinct k-Spectra

The k-spectrum of a string is the set of all distinct substrings of leng...
research
06/14/2019

hepaccelerate: Fast Analysis of Columnar Collider Data

At HEP experiments, processing terabytes of structured numerical event d...

Please sign up or login with your details

Forgot password? Click here to reset