Scalable Hybrid Learning Techniques for Scientific Data Compression

12/21/2022
by   Tania Banerjee, et al.
0

Data compression is becoming critical for storing scientific data because many scientific applications need to store large amounts of data and post process this data for scientific discovery. Unlike image and video compression algorithms that limit errors to primary data, scientists require compression techniques that accurately preserve derived quantities of interest (QoIs). This paper presents a physics-informed compression technique implemented as an end-to-end, scalable, GPU-based pipeline for data compression that addresses this requirement. Our hybrid compression technique combines machine learning techniques and standard compression methods. Specifically, we combine an autoencoder, an error-bounded lossy compressor to provide guarantees on raw data error, and a constraint satisfaction post-processing step to preserve the QoIs within a minimal error (generally less than floating point error). The effectiveness of the data compression pipeline is demonstrated by compressing nuclear fusion simulation data generated by a large-scale fusion code, XGC, which produces hundreds of terabytes of data in a single day. Our approach works within the ADIOS framework and results in compression by a factor of more than 150 while requiring only a few percent of the computational resources necessary for generating the data, making the overall approach highly effective for practical scenarios.

READ FULL TEXT

page 3

page 8

page 9

page 10

page 15

research
01/17/2020

FRaZ: A Generic High-Fidelity Fixed-Ratio Lossy Compression Framework for Scientific Floating-point Data

With ever-increasing volumes of scientific floating-point data being pro...
research
05/25/2021

Exploring Autoencoder-Based Error-Bounded Compression for Scientific Data

Error-bounded lossy compression is becoming an indispensable technique f...
research
04/01/2020

Understanding GPU-Based Lossy Compression for Extreme-Scale Cosmological Simulations

To help understand our universe better, researchers and scientists curre...
research
01/22/2022

Optimizing Huffman Decoding for Error-Bounded Lossy Compression on GPUs

More and more HPC applications require fast and effective compression te...
research
04/23/2023

TopoSZ: Preserving Topology in Error-Bounded Lossy Compression

Existing error-bounded lossy compression techniques control the pointwis...
research
08/07/2023

A General Framework for Progressive Data Compression and Retrieval

In scientific simulations, observations, and experiments, the cost of tr...
research
03/18/2019

A Parallel Data Compression Framework for Large Scale 3D Scientific Data

Large scale simulations of complex systems ranging from climate and astr...

Please sign up or login with your details

Forgot password? Click here to reset