Accelerating JPEG Decompression on GPUs

11/17/2021
by   André Weißenberger, et al.
0

The JPEG compression format has been the standard for lossy image compression for over multiple decades, offering high compression rates at minor perceptual loss in image quality. For GPU-accelerated computer vision and deep learning tasks, such as the training of image classification models, efficient JPEG decoding is essential due to limitations in memory bandwidth. As many decoder implementations are CPU-based, decoded image data has to be transferred to accelerators like GPUs via interconnects such as PCI-E, implying decreased throughput rates. JPEG decoding therefore represents a considerable bottleneck in these pipelines. In contrast, efficiency could be vastly increased by utilizing a GPU-accelerated decoder. In this case, only compressed data needs to be transferred, as decoding will be handled by the accelerators. In order to design such a GPU-based decoder, the respective algorithms must be parallelized on a fine-grained level. However, parallel decoding of individual JPEG files represents a complex task. In this paper, we present an efficient method for JPEG image decompression on GPUs, which implements an important subset of the JPEG standard. The proposed algorithm evaluates codeword locations at arbitrary positions in the bitstream, thereby enabling parallel decompression of independent chunks. Our performance evaluation shows that on an A100 (V100) GPU our implementation can outperform the state-of-the-art implementations libjpeg-turbo (CPU) and nvJPEG (GPU) by a factor of up to 51 (34) and 8.0 (5.7). Furthermore, it achieves a speedup of up to 3.4 over nvJPEG accelerated with the dedicated hardware JPEG decoder on an A100.

READ FULL TEXT

page 1

page 8

page 9

research
11/18/2020

High-Throughput and Memory-Efficient Parallel Viterbi Decoder for Convolutional Codes on GPU

This paper describes a parallel implementation of Viterbi decoding algor...
research
08/26/2020

Exploring the Design Space of Static and Incremental Graph Connectivity Algorithms on GPUs

Connected components and spanning forest are fundamental graph algorithm...
research
06/29/2023

Rapid-INR: Storage Efficient CPU-free DNN Training Using Implicit Neural Representation

Implicit Neural Representation (INR) is an innovative approach for repre...
research
06/21/2023

Recoil: Parallel rANS Decoding with Decoder-Adaptive Scalability

Entropy coding is essential to data compression, image and video coding,...
research
07/07/2017

GPU-Accelerated Algorithms for Compressed Signals Recovery with Application to Astronomical Imagery Deblurring

Compressive sensing promises to enable bandwidth-efficient on-board comp...
research
10/22/2019

GPU-Accelerated Viterbi Exact Lattice Decoder for Batched Online and Offline Speech Recognition

We present an optimized weighted finite-state transducer (WFST) decoder ...
research
07/07/2023

High-performance evaluation of high angular momentum 4-center Gaussian integrals on modern accelerated processors

We present a high-performance evaluation method for 4-center 2-particle ...

Please sign up or login with your details

Forgot password? Click here to reset