Black-Box Statistical Prediction of Lossy Compression Ratios for Scientific Data

05/15/2023
by   Robert Underwood, et al.
0

Lossy compressors are increasingly adopted in scientific research, tackling volumes of data from experiments or parallel numerical simulations and facilitating data storage and movement. In contrast with the notion of entropy in lossless compression, no theoretical or data-based quantification of lossy compressibility exists for scientific data. Users rely on trial and error to assess lossy compression performance. As a strong data-driven effort toward quantifying lossy compressibility of scientific datasets, we provide a statistical framework to predict compression ratios of lossy compressors. Our method is a two-step framework where (i) compressor-agnostic predictors are computed and (ii) statistical prediction models relying on these predictors are trained on observed compression ratios. Proposed predictors exploit spatial correlations and notions of entropy and lossyness via the quantized entropy. We study 8+ compressors on 6 scientific datasets and achieve a median percentage prediction error less than 12 other methods while achieving at least a 8.8x speedup for searching for a specific compression ratio and 7.8x speedup for determining the best compressor out of a collection.

READ FULL TEXT

page 2

page 5

page 14

research
11/27/2021

Exploring Lossy Compressibility through Statistical Correlations of Scientific Datasets

Lossy compression plays a growing role in scientific simulations where t...
research
05/17/2018

Fixed-PSNR Lossy Compression for Scientific Data

Error-controlled lossy compression has been studied for years because of...
research
10/15/2022

Tensor-Train Compression of Discrete Element Method Simulation Data

We propose a framework for discrete scientific data compression based on...
research
11/04/2021

SZ3: A Modular Framework for Composing Prediction-Based Error-Bounded Lossy Compressors

Today's scientific simulations require a significant reduction of data v...
research
09/07/2023

SRN-SZ: Deep Leaning-Based Scientific Error-bounded Lossy Compression with Super-resolution Neural Networks

The fast growth of computational power and scales of modern super-comput...
research
06/12/2017

Z-checker: A Framework for Assessing Lossy Compression of Scientific Data

Because of vast volume of data being produced by today's scientific simu...
research
11/30/2020

Empirical best prediction of small area bivariate parameters

This paper introduces empirical best predictors of small area bivariate ...

Please sign up or login with your details

Forgot password? Click here to reset