A New Acceleration Paradigm for Discrete CosineTransform and Other Fourier-Related Transforms

10/04/2021
by   Zixuan Jiang, et al.
0

Discrete cosine transform (DCT) and other Fourier-related transforms have broad applications in scientific computing. However, off-the-shelf high-performance multi-dimensional DCT (MD DCT) libraries are not readily available in parallel computing systems. Public MD DCT implementations leverage a straightforward method that decomposes the computation into multiple 1D DCTs along every single dimension, which inevitably has non-optimal performance due to low computational efficiency, parallelism, and locality. In this paper, we propose a new acceleration paradigm for MD DCT. A three-stage procedure is proposed to factorize MD DCT into MD FFT and highly-optimized preprocessing/postprocessing with efficient computation and high arithmetic intensity. Our paradigm can be easily extended to other Fourier-related transforms and other parallel computing systems. Experimental results show that our 2D DCT/IDCT CUDA implementation has a stable, FFT-comparable execution time, which is 2× faster than the previous row-column method. Several case studies demonstrate that a promising efficiency improvement can be achieved with our paradigm. The implementations are available at https://github.com/JeremieMelo/dct_cuda/tree/reconstruct.

READ FULL TEXT

page 1

page 2

page 6

research
05/07/2019

P3DFFT: a framework for parallel computations of Fourier transforms in three dimensions

Fourier and related transforms is a family of algorithms widely employed...
research
10/01/2021

pyFFS: A Python Library for Fast Fourier Series Computation and Interpolation with GPU Acceleration

Fourier transforms are an often necessary component in many computationa...
research
04/23/2019

A Flexible Framework for Parallel Multi-Dimensional DFTs

Multi-dimensional discrete Fourier transforms (DFT) are typically decomp...
research
08/19/2023

High Performance Computing Applied to Logistic Regression: A CPU and GPU Implementation Comparison

We present a versatile GPU-based parallel version of Logistic Regression...
research
07/05/2020

An Integer Approximation Method for Discrete Sinusoidal Transforms

Approximate methods have been considered as a means to the evaluation of...
research
11/06/2018

A Quasi-Newton algorithm on the orthogonal manifold for NMF with transform learning

Nonnegative matrix factorization (NMF) is a popular method for audio spe...
research
07/16/2023

Arithmetic Deduction Model for High Performance Computing: A Comparative Exploration of Computational Models Paradigms

A myriad of applications ranging from engineering and scientific simulat...

Please sign up or login with your details

Forgot password? Click here to reset