CODAG: Characterizing and Optimizing Decompression Algorithms for GPUs

07/07/2023
by   Jeongmin Park, et al.
0

Data compression and decompression have become vital components of big-data applications to manage the exponential growth in the amount of data collected and stored. Furthermore, big-data applications have increasingly adopted GPUs due to their high compute throughput and memory bandwidth. Prior works presume that decompression is memory-bound and have dedicated most of the GPU's threads to data movement and adopted complex software techniques to hide memory latency for reading compressed data and writing uncompressed data. This paper shows that these techniques lead to poor GPU resource utilization as most threads end up waiting for the few decoding threads, exposing compute and synchronization latencies. Based on this observation, we propose CODAG, a novel and simple kernel architecture for high throughput decompression on GPUs. CODAG eliminates the use of specialized groups of threads, frees up compute resources to increase the number of parallel decompression streams, and leverages the ample compute activities and the GPU's hardware scheduler to tolerate synchronization, compute, and memory latencies. Furthermore, CODAG provides a framework for users to easily incorporate new decompression algorithms without being burdened with implementing complex optimizations to hide memory latency. We validate our proposed architecture with three different encoding techniques, RLE v1, RLE v2, and Deflate, and a wide range of large datasets from different domains. We show that CODAG provides 13.46x, 5.69x, and 1.18x speed up for RLE v1, RLE v2, and Deflate, respectively, when compared to the state-of-the-art decompressors from NVIDIA RAPIDS.

READ FULL TEXT

page 1

page 3

research
08/24/2020

Tearing Down the Memory Wall

We present a vision for the Erudite architecture that redefines the comp...
research
04/14/2023

GPULZ: Optimizing LZSS Lossless Compression for Multi-byte Data on Modern GPUs

Today's graphics processing unit (GPU) applications produce vast volumes...
research
03/13/2018

CuLDA_CGS: Solving Large-scale LDA Problems on GPUs

Latent Dirichlet Allocation(LDA) is a popular topic model. Given the fac...
research
02/09/2020

ISM2: Optimizing Irregular-Shaped Matrix-Matrix Multiplication on GPUs

Linear algebra operations have been widely used in big data analytics an...
research
04/17/2021

Ripple : Simplified Large-Scale Computation on Heterogeneous Architectures with Polymorphic Data Layout

GPUs are now used for a wide range of problems within HPC. However, maki...
research
09/13/2022

A Many-ported and Shared Memory Architecture for High-Performance ADAS SoCs

Increasing investment in computing technologies and the advancements in ...
research
11/04/2021

Safe and Practical GPU Acceleration in TrustZone

We present a holistic design for GPU-accelerated computation in TrustZon...

Please sign up or login with your details

Forgot password? Click here to reset