Optimizing Scientific Data Transfer on Globus with Error-bounded Lossy Compression

07/11/2023
by   Yuanjian Liu, et al.
0

The increasing volume and velocity of science data necessitate the frequent movement of enormous data volumes as part of routine research activities. As a result, limited wide-area bandwidth often leads to bottlenecks in research progress. However, in many cases, consuming applications (e.g., for analysis, visualization, and machine learning) can achieve acceptable performance on reduced-precision data, and thus researchers may wish to compromise on data precision to reduce transfer and storage costs. Error-bounded lossy compression presents a promising approach as it can significantly reduce data volumes while preserving data integrity based on user-specified error bounds. In this paper, we propose a novel data transfer framework called Ocelot that integrates error-bounded lossy compression into the Globus data transfer infrastructure. We note four key contributions: (1) Ocelot is the first integration of lossy compression in Globus to significantly improve scientific data transfer performance over wide area network (WAN). (2) We propose an effective machine-learning based lossy compression quality estimation model that can predict the quality of error-bounded lossy compressors, which is fundamental to ensure that transferred data are acceptable to users. (3) We develop optimized strategies to reduce the compression time overhead, counter the compute-node waiting time, and improve transfer speed for compressed files. (4) We perform evaluations using many real-world scientific applications across different domains and distributed Globus endpoints. Our experiments show that Ocelot can improve dataset transfer performance substantially, and the quality of lossy compression (time, ratio and data distortion) can be predicted accurately for the purpose of quality assurance.

READ FULL TEXT

page 1

page 9

research
05/25/2021

Exploring Autoencoder-Based Error-Bounded Compression for Scientific Data

Error-bounded lossy compression is becoming an indispensable technique f...
research
06/22/2022

ROIBIN-SZ: Fast and Science-Preserving Compression for Serial Crystallography

Crystallography is the leading technique to study atomic structures of p...
research
06/23/2018

Optimizing Lossy Compression Rate-Distortion from Automatic Online Selection between SZ and ZFP

With ever-increasing volumes of scientific data produced by HPC applicat...
research
11/04/2021

SZ3: A Modular Framework for Composing Prediction-Based Error-Bounded Lossy Compressors

Today's scientific simulations require a significant reduction of data v...
research
07/13/2023

AMRIC: A Novel In Situ Lossy Compression Framework for Efficient I/O in Adaptive Mesh Refinement Applications

As supercomputers advance towards exascale capabilities, computational i...
research
10/07/2020

SDC Resilient Error-bounded Lossy Compressor

Lossy compression is one of the most important strategies to resolve the...
research
01/08/2021

SDRBench: Scientific Data Reduction Benchmark for Lossy Compressors

Efficient error-controlled lossy compressors are becoming critical to th...

Please sign up or login with your details

Forgot password? Click here to reset