Practically efficient methods for performing bit-reversed permutation in C++11 on the x86-64 architecture

08/02/2017
by   Christian Knauth, et al.
0

The bit-reversed permutation is a famous task in signal processing and is key to efficient implementation of the fast Fourier transform. This paper presents optimized C++11 implementations of five extant methods for computing the bit-reversed permutation: Stockham auto-sort, naive bitwise swapping, swapping via a table of reversed bytes, local pairwise swapping of bits, and swapping via a cache-localized matrix buffer. Three new strategies for performing the bit-reversed permutation in C++11 are proposed: an inductive method using the bitwise XOR operation, a template-recursive closed form, and a cache-oblivious template-recursive approach, which reduces the bit-reversed permutation to smaller bit-reversed permutations and a square matrix transposition. These new methods are compared to the extant approaches in terms of theoretical runtime, empirical compile time, and empirical runtime. The template-recursive cache-oblivious method is shown to be competitive with the fastest known method; however, we demonstrate that the cache-oblivious method can more readily benefit from parallelization on multiple cores and on the GPU.

READ FULL TEXT
research
05/09/2018

Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks

This paper presents the Neural Cache architecture, which re-purposes cac...
research
07/05/2022

Runtime Analysis for Permutation-based Evolutionary Algorithms

While the theoretical analysis of evolutionary algorithms (EAs) has made...
research
04/15/2022

Towards a Stronger Theory for Permutation-based Evolutionary Algorithms

While the theoretical analysis of evolutionary algorithms (EAs) has made...
research
07/21/2020

Bit-level Parallelization of 3DES Encryption on GPU

Triple DES (3DES) is a standard fundamental encryption algorithm, used i...
research
08/03/2022

Layered Binary Templating: Efficient Detection of Compiler- and Linker-introduced Leakage

Cache template attacks demonstrated automated leakage of user input in s...
research
08/04/2020

Bucket Oblivious Sort: An Extremely Simple Oblivious Sort

We propose a conceptually simple oblivious sort and oblivious random per...
research
12/14/2020

Template Matching with Ranks

We consider the problem of matching a template to a noisy signal. Motiva...

Please sign up or login with your details

Forgot password? Click here to reset