Log In Sign Up

SZ3: A Modular Framework for Composing Prediction-Based Error-Bounded Lossy Compressors

Today's scientific simulations require a significant reduction of data volume because of extremely large amounts of data they produce and the limited I/O bandwidth and storage space. Error-bounded lossy compressor has been considered one of the most effective solutions to the above problem. In practice, however, the best-fit compression method often needs to be customized/optimized in particular because of diverse characteristics in different datasets and various user requirements on the compression quality and performance. In this paper, we develop a novel modular, composable compression framework (namely SZ3), which involves three significant contributions. (1) SZ3 features a modular abstraction for the prediction-based compression framework such that the new compression modules can be plugged in easily. (2) SZ3 supports multialgorithm predictors and can automatically select the best-fit predictor for each data block based on the designed error estimation criterion. (3) SZ3 allows users to easily compose different compression pipelines on demand, such that both compression quality and performance can be significantly improved for their specific datasets and requirements. (4) In addition, we evaluate several lossy compressors composed from SZ3 using the real-world datasets. Specifically, we leverage SZ3 to improve the compression quality and performance for different use-cases, including GAMESS quantum chemistry dataset and Advanced Photon Source (APS) instrument dataset. Experiments show that our customized compression pipelines lead to up to 20 the same data distortion compared with the state-of-the-art approaches.


page 2

page 3

page 4

page 5

page 6

page 7

page 8

page 9


TAC: Optimizing Error-Bounded Lossy Compression for Three-Dimensional Adaptive Mesh Refinement Simulations

Today's scientific simulations require a significant reduction of data v...

TAC+: Drastically Optimizing Error-Bounded Lossy Compression for 3D AMR Simulations

Today's scientific simulations require a significant reduction of data v...

MGARD+: Optimizing Multilevel Methods for Error-bounded Scientific Data Reduction

Data management is becoming increasingly important in dealing with the l...

Improving Prediction-Based Lossy Compression Dramatically Via Ratio-Quality Modeling

Error-bounded lossy compression is one of the most effective techniques ...

Z-checker: A Framework for Assessing Lossy Compression of Scientific Data

Because of vast volume of data being produced by today's scientific simu...

Adaptive Configuration of In Situ Lossy Compression for Cosmology Simulations via Fine-Grained Rate-Quality Modeling

Extreme-scale cosmological simulations have been widely used by today's ...

SIMD Lossy Compression for Scientific Data

Modern HPC applications produce increasingly large amounts of data, whic...

1 Introduction

Data reduction is becoming increasingly important to scientific research because of the large amount of data produced by simulations running on exascale computing systems and experiments conducted on advanced instruments. For instance, recent climate research, which performs climate simulation in 1 km1 km resolution, generates 260 TB of floating-point data every 16 seconds [14]. When the generated data are dumped into parallel file systems or secondary storage systems to ensure long-term access, the limited storage capacity and/or I/O bandwidth will impose great challenges. While scientists aim to significantly reduce the size of their data to mitigate this problem, they are also concerned about the quality of data reduction. General data reduction approaches, including traditional wavelet-based methods [36, 33]

and emerging neural-network-based methods 

[34, 1] widely used in the image processing community, may lead to loss of important scientific insights as they do not enforce quantifiable error bounds on reconstructed data.

Over the past decade, error-bounded lossy compression [13, 32, 22, 38, 27, 26, 2, 3, 4, 25]

has been proposed and employed to reduce scientific data while controlling the distortion. Depending on how the original data are decorrelated, existing compressors can be classified into prediction-based and transform-based. These compressors all allow users to specify an error bound during compression and ensure that the error between original and decompressed data is strictly than the bound. In this paper we focus mainly on prediction-based approaches because transformed-based approaches can be formulated to prediction-based ones by using the corresponding transforms as predictors (at the cost of certain speed degradation), as suggested by prior works 


Although existing prediction-based approaches such as SZ [13, 32, 22] are general and can be applied to various scenarios, they may not lead to the best quality and performance given a specific dataset or error bound requirement. The best-fit compression method is never universal, which is true even for the same dataset because the compression efficiency would be affected by the required error bounds as well. For instance, SZ-1.4 [32] with a Lorenzo predictor shows very good compression ratios with low error bounds, but it suffers from low quality and artifacts with high error bounds, where approaches with a regression-based predictor [22]

or an interpolation-based predictor 

[37] have been proved to be much more efficient. Likewise, data generated by the GAMESS quantum chemistry package [5] exhibits periodic scaled patterns, where a pattern-based predictor demonstrates obvious improvements in both compression speed and ratios [15]. Thus, a loosely coupled compression framework that allows for customization of the prediction-based error-bounded lossy compression model is critical to optimizing the compression quality and performance for users in practice.

In this paper, we present a modular and composable framework—SZ3—which can be used to easily create new error-bounded lossy compressors on demand. SZ3 features a modular abstraction for the prediction-based compression pipelines such that modules can be developed and adopted independently. Specifically, users can customize any stages in the compression pipeline, including preprocessing, prediction, quantization, encoding, and lossless compression, via carefully designed modules. Based on these customized modules, SZ3 allows users to compose their own compressors (or compression pipelines) to adapt to diverse data characteristics and requirements, thus achieving high compression quality and performance with minimal effort. Such a composable design is able to provide a variety of useful supports, including point-wise relative error bounds (logarithmic transform-based preprocessor [21]), feature-preserving compression (element-wise quantizer [24]), and speed-ratio tradeoffs (module bypass). Although designed for data in Cartesian grids, SZ3 can also work with data in unstructured grids by applying a linearization which re-arranges data to a one-dimensional array.

We summarize our contributions as follows.

  • We carefully design and develop SZ3, a flexible, efficient framework that allows easy creation and customization of prediction-based error-bounded lossy compressors. This work is critical to obtaining high data compression quality because of diverse scientific data characteristics and user requirmeents in practice.

  • We develop a new compressor using SZ3 for data generated from GAMESS quantum chemistry package. By substituting the default quantizer with a specialized one and augmenting a lossless compression stage, the composed compressor achieves better performance than current state of the art with minimal effort.

  • We develop an efficient compressor using SZ3 for data collected from Advanced Photon Source instruments. By incorporating an adaptive pipeline with existing modules, the composed compressor leads to the best rate-distortion under any bit rate.

  • We compare the sustainability of SZ3 with leading prediction-based compressors, and then integrate several compression pipelines to demonstrate the necessity of diverse pipelines. The performance and efficiency are carefully characterized using diverse scientific datasets across multiple domains.

The rest of the paper is organized as follows. In Section 2 we discuss related work. In Section 3 we present the design and modules of SZ3 framework. In Section 4 and Section 5 we describe how we leverage the proposed framework to create efficient compressors for GAMESS and APS data in details. In Section 6 we present the comparison on sustainability and evaluation for diverse pipelines. In Section 7 we conclude with a vision of future work.

2 Related Work

With more powerful high-performance computing (HPC) systems and high-resolution instruments, the volume and generation speed of scientific data have been experiencing an unprecedented increase in recent years, causing problems in data storage, transmission, and analysis. Compared with the fast evolution of computing resources, the I/O systems are heavily underdeveloped, remaining a bottleneck in most scenarios. Data compression is regarded as a direct way to mitigate such a bottleneck, and many approaches have been presented in the literature to address this issue.

Lossless compressors [12, 39, 8, 11, 6] ensure that no information is lost during the compression. Despite their success in many fields, lossless compressors suffer from low compression ratios on floating-point scientific data due to the almost randomly distributed mantissas. Previous work [28] has shown that state-of-the-art lossless compressors can lead to a compression ratio of only 2 when directly applied to most floating-point scientific datasets, whereas scientific applications usually require over reduction on their data [9].

Lossy compressors [36, 33, 34, 1, 19, 30] offer the flexibility to trade off data quality for high compression ratios, but they may result in a higher distortion than users’ expectation. The unbounded distortion may result in unexpected behaviors in post hoc data analytics and even false discoveries, leaving risks in trusting the analysis results on the decompressed data.

In comparison with traditional lossy compression, error-bounded lossy compression has been rapidly developed to fill the gap by reducing the size of scientific data while guaranteeing quantifiable error bounds. Prediction-based and transform-based models are the most popular models for designing error-bounded lossy compressors. One of the most well-known transform-based error-bounded lossy compressors is ZFP [27], which decorrelates the data using a near-orthogonal transform and encodes the transformed coefficients using embedded encoding. MGARD [2, 3, 4] is another compressor relying on the transform-based model. It leverages wavelet theories and projection for data decorrelation, followed by linear-scaling quantization, variable-length encoding, and lossless compression.

According to recent studies [29], SZ [13, 32, 22] is regarded as one of the leading prediction-based lossy compressor in the scientific computing community. SZ follows a 4-step pipeline to perform the compression, namely data prediction, quantization, Huffman encoding, and lossless compression. Significant efforts have been made to enable new features or functionalities based on this pipeline. For instance, in [21], a logarithmic transform was used in a preprocessing step to change a pointwise-relative-error-bound compression problem to an absolute-error-bound compression problem, which is then solved by the SZ compression pipeline. In [24], the authors derived the element-wise error bounds based on how critical points are extracted, and they leveraged the SZ compression pipeline along with element-wise quantization to ensure that those critical points are preserved in the decompressed data. In [15], the authors adjusted the pipeline by using a pattern-based predictor to better exploit the correlation in data and a predefined fixed Huffman tree for faster encoding. Attempts were also made to use the near-orthogonal transform in ZFP as a predictor in the pipeline [20]. All the above works, however, are developed within a tightly-coupled design, so that the compression pipelines cannot be adjusted on demand, which thus cannot adapt to user’s diverse requirements or different use-cases in turn. By contrast, the SZ3 framework offers a breakthrough, flexible, modular framework, which can be leveraged to adapt to diverse use-cases very efficiently.

Although many efforts have been spent on abstracting lossy compression, most of them are focused on enabling an adaptive selection of existing compressors. For instance, SCIL [18] attempts to abstract across compressors and acts as a metacompressor that provides backends to various existing algorithms. LibPressio [35] provides a common API for different compressors to allow for easy integration of lossy compression in an extensible fashion. Instead, SZ3 separates and abstracts stages in the prediction-based compression model, allowing for easy creation of new compressors in fine granularity rather than selection of existing ones. To the best of our knowledge, this is the first attempt to build a generic framework that allows users to easily customize their own compressors based on their actual needs.

3 SZ3: A Modular Compression Framework

Fig. 1: SZ3 design overview: left part of the figure shows the abstraction and key functionalities of prediction-based compression pipeline with SZ3 modules; right part of the figure displays common instances of these modules and how five leading compressors are composed by these instances.

In this section we introduce the design and implementation of SZ3. With modularity in mind, SZ3 enables easy customization of prediction-based compression pipelines with minimal overhead.

3.1 Design overview

Figure 1 illustrates the design overview of SZ3. The compression process is abstracted into five stages (displayed as the dotted boxes), each of which serves as an individual module. Orange boxes depict the key functionalities of each module and green boxes illustrate several corresponding instances. A compressor is realized by identifying a compression pipeline which is composed by instances from each module. This figure demonstrates how five leading compressors designed for different purposes, namely FPZIP [26], SZ1.4 [32], SZ2 [22], SZ-Pastri [15], and cpSZ [24], are composed using this abstraction (see the solid lines), which shows the generality of the abstraction. For instance, the FPZIP compression pipeline bypasses the precessor and leverages Lorenzo predictor for data decorrelation, followed by residual encoding to ensure error control and arithmetic encoding for size reduction. In the following text we will detail the modular design in SZ3, along with example instances of the modules.

3.2 Modularity

In this section we discuss the five modules in SZ3, namely preprocessor, predictor, quantizer, encoder, and lossless compressor, with module instances that have proven to be effective for scientific datasets. Developers can write their own module instances and plug them in the compression pipeline to design prediction-based error-bounded lossy compression for their dataset. Due to space limitation, we present only the most important functions and several representative instances for each module. Detailed interfaces for each module are listed in Appendix A.

Preprocessor (see Appendix A.1):  The preprocessor is used to process the input dataset for achieving high efficiency or diverse requirements before performing the actual compression. The key function in the preprocessor, namely preprocess, takes in original data and compression configuration as input, and then transforms the data in an in-place fashion and change the compression configuration accordingly. If users want to keep original data while the preprocessor needs to alter the data, a separate buffer is required to perform the preprocessing. Based on the actual design, the postprocess function either reverses the preprocessing procedure or is omitted.

Instances: A typical preprocessor for error-controlled lossy compressors is the logarithmic transform used to enable point-wise relative error bounds [21], where data are transformed to the logarithmic domain and compressed with an absolute error bound transformed from the point-wise relative one. Besides, SZ-Pastri [15] requires a preprocessing step to identify the proper parameters, such as block size and pattern size, for the pattern-based predictor. In Section 5, we further leverage a preprocessor to alter the layout of data for better compression ratio. This is based on our observations that some 3D datasets will have a better compression ratio when treated as a 2D or 1D dataset (as will be detailed later).

Predictor (see Appendix A.2):  Predictors are the key components of prediction-based compressors, which perform value prediction based on diverse patterns for data decorrelation. There are two important functions in the predictor interface, namely predict and save/load. The predict function outputs the predicted value based on the characteristics of the underlying predictor using the multidimensional iterator (to be detailed in Section 6.1). Necessary information about the predictor, for instance the coefficients of the regression predictor [22, 38], will be recorded in the save function. During decompression, load function will be invoked to reconstruct the predictor.

Instances: Lorenzo predictor [17] and its high order variations [32], which perform multidimensional prediction for each data point based on its neighbor data points, are classic and powerful prediction methods used in lossy compressors such as SZ [32] and FPZIP [26]. In [22]

, a regression-based predictor is proposed to construct a hyperplane and uses points on the hyperplane as predicted values, which significantly improves the prediction efficiency when user-specified error bound is high. We further implement a composite predictor instance inherited from this interface, which may consist of multiple predictors using different prediction algorithms. This requires an error estimation function for each predictor, which will be used to determine the best-fit predictor for a given data chunk. The statistical approach in 

[22] and [25] is generalized as the estimation criterion in SZ3. With the composite predictor, multialgorithm designs with more than one predictors can be implemented very easily.

Quantizer (see Appendix A.3): The quantizer is used to approximate prediction errors generated by the predictors with a smaller countable set to reduce their entropy while respecting the error bound. As the only module that introduces errors in the compression pipeline, quantizer determines how the final errors in the decompressed data are controlled. The quantize function is the most important function in a quantizer, where the prediction error is quantized based on the original data value and its predicted value from the predictor. During decompression, the decompressed data value is computed by the recover function, which reverses the steps in the quantize function. The quantizer module is also responsible for encoding/decoding the unpredictable data, i.e., data fall out of the countable set. This is realized in the save/load function.

Instances: Linear-scaling quantizer [32] is a widely used quantizer to enable absolute error control in lossy compression. In particular, this quantizer constructs a set of equal-sized consecutive bins each with twice the error bound in length. Then, the prediction error will be translated into the index of the bin containing it. Prediction errors that fall out of range are regarded as unpredictable and will be encoded and stored separately. Besides, log-scale quantizer [10] is used to adjust the size of bins for a more centralized error distribution and element-wise quantizer [24] is used to provide fine-granularity error control for each data point.

Encoder (see Appendix A.4): Encoder is a lossless scheme to reduce the storage of integer indices (or symbols) generated by quantizers. The encoder module involves two essential functions—encode and save/load. The encode function transforms the quantized integers from the quantizer to compressed binary formats; similar to other modules, the encoder module has a decode function which performs the reverse process during decompression. This module also has save/load functions for storing/recovering metadata such as the Huffman tree.

Instances: Huffman encoder [16] is a classic variable-length encoding algorithm that uses fewer bits to represent more common symbols. This encoder first constructs a Huffman tree based on the frequency of input data using a greedy algorithm, generates codebook according to the tree, and then compress the data using the codebook. The fixed Huffman encoder used in SZ-Pastri [15] is a variation of the Huffman encoder, which uses a predefined Huffman tree instead of constructing one on the fly to eliminate the cost for both construction and storage of the tree. Arithmetic encoder is another type of encoder widely used in data compression, which represents current information as range and encodes the entire data into a single number.

Lossless Compressor (see Appendix A.5): Lossless compressors are used to further shrink the size of compressed binary formats produced by the encoders, because the entropy-based encoders may overlook repeated patterns in the data thus lead to suboptimal compression ratios. The lossless compressor module in SZ3 acts mainly as a proxy of state-of-the-art lossless compression libraries. This module invokes external libraries to compress the output from the encoder module with compress and decompress interfaces.

Instances: We provide portable interfaces in SZ3 to integrate with state-of-the-art lossless compressors including ZSTD [39], GZIP [12], and BLOSC [6]. Because lossless compressor is a standlone module attached to the previous stages, it would be fairly easy to include and integrate new lossless compression routines as well.

3.3 Compression pipeline composition

In SZ3, a compression pipeline can be composed by identifying the instances of modules and putting them together. Algorithm 1 shows how a general-purpose error-controlled lossy compressor is composed using the selected preprocessor, predictor, quantizer, encoder, and lossless compressor. In addition, SZ3 employs compile time polymorphism (see Section 6.1) such that users can switch the instances without bothering to modify the compression functions. This makes SZ3 highly adaptive to diverse use cases, with significantly reduced efforts on compressor development.

Input: input data of size , compression configuration
Output: compressed data

1:   /*perform preprocessing*/
2:  for  do
3:       /*perform prediction*/
4:       /*perform quantization*/
5:  end for
7:   /*save predictor*/
8:   /*save quantizer*/
9:   /*perform encoding*/
10:   /*save encoder*/
11:   /*perform lossless compression*/
12:  return  
Algorithm 1 A General Compressor in SZ3

4 Developing an Efficient Compressor for GAMESS Data using SZ3

In this section, we present how we create a new compressor using SZ3, which can improve the compression ratios for the data generated from the real-world scientific simulation GAMESS [5]. In the following text, we first introduce the GAMESS data and its current compressor — SZ-Pastri [15], and then present our characterization on the quanzation integers and the new customization method. At last, we evaluate the compression ratios and speed based on three representative data fields in GAMESS.

4.1 GAMESS data and SZ-Pastri Compressor

Quantum chemistry researchers often need to obtain a wavefunction by solving the Schrödinger differential equation, which involves all the chemical system’s information. The wavefunction needs to be constructed by two-electron repulsion integrals (ERI), which requires too large a memory capacity to hold at runtime during the simulation. A straightforward solution is reproducing the ERI dataset whenever needed during the simulation, although this would significantly delay the simulation because of the fairly expensive cost in generating the ERI data. In our prior work, we developed an efficient error-bounded lossy compressor called SZ-Pastri [15], which can compress the ERI data in memory and decompress it in the beginning of each iteration of the simulation. Such a method can effectively avoid the ERI recalculation cost, so as to improve the overall performance. SZ-Pastri takes advantages of the periodic patterns that exist in the GAMESS dataset, because the ERI values are calculated in order and are dependent on shape and distance of electron clouds. Specifically, SZ-Pastri identifies a periodic pattern and uses it along with a scaling coefficient for each block to enable accurate data prediction. This leads to substantial performance gain compared to existing general compressors [22, 27].

4.2 Data characterization and pipeline customization

We first characterize the quantization integers for SZ-Pastri, which are the most impactful factors for the final compression ratios. To enable correct decompression, SZ-Pastri needs to quantize and store the information for both the periodic patterns and block-wise scales. Thus, the quantization integers in SZ-Pastri consist of three components, which are computed from data, patterns, and scales, respectively. As displayed in Figure 3(a), the distribution of quantization integers for the pattern-based predictor is centered in , which indicates very high prediction accuracy and thus better compression ratios. However, a significant percentage (20% for data) of the quantization integers fall out of the quantization range ( in this setting). These data, usually described as unpredictable, require additional mechanisms for storage in order to be correctly recovered during decompression. In SZ-Pastri, they are directly truncated and stored based on the user-specified error, which fails to exploit the correlation in the data to achieve high compression, although relatively fast compression speed is provided.

Fig. 2: Compression pipelines for GAMESS data. Blue boxes indicate optimized/added modules in SZ3-Pastri over SZ-Pastri.

(a) Data

(b) Pattern

(c) Scale
Fig. 3: Distribution of quantization integers in SZ3-Pastri.

Based on these observations, we improve the compression efficiency of SZ-Pastri by leveraging a specialized quantizer to deal with the unpredictable data. Inspired by the embedded encoding approaches widely used in transform-based compressors [27, 23], we store data in the order of bitplane instead of applying the truncation directly. A bitplane represents a set of bits corresponding to a given bit position in the binary representations of the data. Because small data values have meaningful bits only in less significant bitplanes, the relatively significant bitplanes will yield good compression ratios because of consecutive s. Similar to [27], we first align the exponents of the prediction difference on unpredictable data to that of the error bound to convert the floating-point data into integers. These integers are then recorded in the order of bitplanes, namely, from the most significant bitplane to the least significant bitplane. Compared with direct truncation, this encoding method will not change the encoded size at this stage; however, its compressive encoded format will promise better compression ratios when lossless compression is adopted. Since this quantizer takes special care of unpredictable data storage, we name it Unpred-aware Quantizer throughput the paper. To take advantage of this method, we also add a lossless stage to the composed compression pipeline, as displayed in Figure 2. This new compressor is called SZ3-Pastri, as it optimizes SZ-Pastri using the SZ3 framework.

4.3 Evaluation results

We evaluate our method and compare it with SZ-Pastri and its variation (SZ-Pastri equipped with lossless compression) using three representative fields in GAMESS. Unless otherwise noted, all the experiments in this paper are conducted on the Bebop supercomputer [7] at Argonne National Laboratory. Bebop has 664 Broadwell nodes, each of which is equipped with two Intel Xeon E5-2695v4 processors containing 36 physical cores in total and 128 GB of DDL4 memory.

The rate-distortion graphs of the evaluation are displayed in Figure 4

. This graph entails the correlation between bit rate and Peak Signal-to-Noise Ratio (PSNR). The bit rate equals

where is number of bit in original data representation (e.g., for single-precision and for double-precision floating-point data) and is the compression ratio. PSNR is inversely proportional to the mean square error of decompress data and original data in logarithmic scale. Lower bit rate and higher PSNR indicate better compression quality. According to this figure, SZ3-Pastri leads to the best rate-distortion along almost all bit rates. For example, the improvements of compression ratios on the dataset are generally and , respectively, compared with SZ-Pastri and its lossless variation. We also show the exact compression ratio and speed of the three approaches under the desired absolute error tolerance (1E-10 according to the domain scientists) in Table I. Compared with original SZ-Pastri, SZ3-Pastri significantly improves the compression ratios under the requirements. However, it has a degradation in performance, which is caused by the embedded encoding on unpredictable data (i.e., unpred-aware Quantizer, which improves the compression ratio) and the final lossless compression.

Dataset Compressor Ratios Compression Speed
SZ-Pastri 8.46 662.01 MB/s
SZ-Pastri-with-zstd 9.27 377.17 MB/s
SZ3-Pastri 10.76 244.43 MB/s
SZ-Pastri 8.40 643.58 MB/s
SZ-Pastri-with-zstd 9.23 370.88 MB/s
SZ3-Pastri 10.06 221.03 MB/s
SZ-Pastri 9.14 613.12 MB/s
SZ-Pastri-with-zstd 9.96 364.51 MB/s
SZ3-Pastri 10.71 226.80 MB/s
TABLE I: Result on GAMESS data when absolute error bound is 1E-10



Fig. 4: Rate-distortion on GAMESS data.

5 Composing an Efficient Compressor for APS Data using SZ3

We then leverage our SZ3 framework to create an adaptive compression pipeline for the X-ray ptychographic data acquired at the Advanced Photon Source (APS). Similar to the previous section, we first introduce APS data, followed by the data characterization and compression pipeline customization along with the evaluation.

5.1 APS data

X-ray ptychography is a main high-resolution imaging technique that takes advantage of the coherence provided by the synchrotron source. However, this computational method of microscopic imaging requires much larger data volume and computational resource compared with conventional microscopic techniques. A revolutionary increase of about 3 orders of magnitude in the coherent flux provided by the coming APS upgrade will aggravate the burden of the data transfer and storage. Therefore, a new compression strategy with high compression ratios is being highly pursued in ptychography. In order to represent most sample scenarios, two ptychographic datasets were acquired from a computer chip pillar (isolated sample) and a subregion of an entire flat chip (extended sample), respectively. In both cases, a Dectris Eiger detector (5141030 pixels) was used to acquire diffraction patterns as X-ray beam scanned across the sample, and the 2D diffraction images were saved along the time dimension to form a 3D matrix array (195005141030 for chip pillar and 168005141030 for flat chip). In the data analysis, domain experts usually cropped only central region of the diffraction pattern that contains X-ray signals (lots of zeros outside this region). To fairly assess our compression strategy without giving an overestimated compression ratio, we cropped only central 256256 pixels.

5.2 Data characterization and pipeline customization

We design an adaptive compression pipeline for APS data based on the following analysis. First, multidimensional Lorenzo predictor introduces higher noise because more decompressed data values are used for prediction [22], even though it is usually superior to the one-dimensional one by exploiting the multidimensional correlation. Second, although APS data has three dimensions (e.g., for the chip pillar sample), it is actually a stack of 2D images along the time dimension with relatively low spatial correlation. When the spatial correlation is not strong, the benefit of using the multidimension Lorenzo predictor may not be able to make up the cost for the higher noise. In addition, considering the usually high correlation in time compared with that in spatial region, it might be more effective to compress the data along the time dimension, namely, treating the data as 1D time series. On the other hand, the multidimensional regression-based predictor should be included because it leverages the multidimensional correlation without being affected by the decompression noise [22], which yields good performance when error bound is relatively high. This requires switching predictors based on the error bound: using a traditional multialgorithm predictor that involves regression for high error bounds and a customized 1D Lorenzo predictor with a transposition preprocessor that reorganizes the data along the time dimension for low error bounds. In our implementation, we switch to the latter along with quantization bin width when the user-specified absolute error bound is less than since this setting generates lossless compression. Under such circumstance, the noise introduced by using decompressed data is reduced to when the unpred-aware Quantizer is leveraged, thanks to the restricted quantization bin and the principle of embedded encoding. We further employ a fixed Huffman encoder for fast encoding with comparable compression ratios. The corresponding compression pipeline for APS data is depicted in Figure 5.

Fig. 5: Adaptive compression pipeline for APS data.

5.3 Evaluation results

We evaluate the customized APS compressor and compare it with 3 baselines: the generic SZ-2.1 compressor for 1D, 3D, and transposed 1D data. As illustrated in Figure 6, a 3D compressor leads to higher PSNR under low bit-rate (high compression ratios), but it suffers when the bit-rate increases to a certain level, where there is a sharp increase in the compression quality for 1D compressors. This is caused by the fact that the noise introduced by decompressed data is mitigated with such an error setting in this dataset. SZ-2.1 is not aware of this information and incorrectly estimates the Lorenzo prediction noise, leading to the selection of regression predictor even when Lorenzo predictor is better. SZ3-APS adaptively chooses the compression pipeline based on the error bound, which leads to comparable performance to that of SZ-2.1 for 3D data when error bound is high. Furthermore, the adopted Unpred-aware Quantizer exhibits higher compression ratios in low error bound, since it provides near-lossless decompressed data that improves the prediction efficiency of the Lorenzo predictor. In absolute terms, when the decompression data is near lossless (i.e., error bound less than ), the compression ratio gain of the proposed compression pipeline is on chip pillar and on flat chip compared with the second best one. Note that SZ3-APS turns out to be lossless in this case, which leads to infinity PSNR in the figure.

(a) chip_pillar

(b) flat_chip
Fig. 6: Rate-distortion on APS data.

6 Sustainability, Quality, and Performance Investigation of SZ3

In this section we first discuss the sustainability of SZ3, and then leverage SZ3 to characterize the quality and performance of diverse compression pipelines.

6.1 Sustainability

We design SZ3 with modularity in mind to allow for a composable framework with high sustainability. Specially, we compare the design of SZ3 with that of SZ2 [31], one of the leading error-controlled lossy compressors with prediction-based pipeline, to demonstrate its superiority.

6.1.1 The codebase of SZ2

SZ2 has a large codebase including more than 120 functions with little code reuse, as shown in Table II. For example, SZ2 has separate functions to handle the compression or decompression on a dataset with a specific data type, although the logic to compress and decompress different data types is similar. As a result, SZ2 needs to maintain separate code for each data type.

width=0.8 Data Type FP32, FP64 INT8, INT16, INT32, INT64 UINT8, UINT16, UINT32, UINT64 Data Dimension 1D, 2D, 3D, 4D Functionality Compression Decompression Parameter Optimization

TABLE II: SZ2 contains more than 120 functions to support different data types, data dimensions, and data-processing methods

The lack of software architecture design makes it difficult and time-consuming to modify and extend the functionality of SZ2. With more than 120 functions to update, some of them are likely to be missed when adding new features to SZ2. Furthermore, the complexity of SZ2 brings challenges to fully validate the correctness of newly added features, because it is time-consuming to write test code that achieves high code coverage for so many functions of SZ2.

6.1.2 The Codebase of SZ3

We propose three technologies in SZ3 to improve the code sustainability dramatically, namely compile-time polymorphism, datatype abstraction, and multidimensional iterator.

Compile Time Polymorphism: SZ constructs the composed compression pipelines at compile time, because compile time polymorphism provides an efficient way to switch different implementations of modules to avoid runtime performance downgrade. For implementation, the module instances are placed as the template parameters of the compressor (see Appendix A.6). A static assert is executed during the construction of the compressor to ensure that only classes that inherent from specific module interfaces are allowed to be used to initialize template parameters.

Datatype Abstraction: We adopt datatype abstraction to simplify the codebase of SZ3 significantly. Most module interfaces, implementations, and compressor pipelines in SZ3 are designed with datatypes as template parameters for efficient code reuse. By comparison, SZ2 has separate implementations for each datatype, which result in a large code base without code reuse.

Multidimensional Iterator: A multidimensional iterator is designed in SZ3 to support data access patterns of different dimensions. This is totally different from SZ2, where independent implementations are required for each dimensionality. The multidimensional iterator in SZ3 provides a simple API to access the current and nearby data points and move to another position. The boundary situations are handled inside iterators. The iterator design eliminates the need to write separate code based on the data dimensions. The pseudocode of prediction and quantization using the multidimensional iterator is presented in Appendix A.7.

With the multidimensional iterator, the complex nested-loop to iterator through the data and the boundary condition checking are hidden from the users. The multidimensional iterator also supports arbitrary movement. For example, to change a 3D iterator to its upper left neighbor, developers can simply use iterator.move(-1, -1, -1) instead of calculating the offset for three dimensions.

6.2 Pipeline integration and evaluation

We integrate three compression pipelines using SZ3 and reveal their suitable cases in terms of quality and performance. Details of the three pipelines are described as follows.

(a) RTM

(b) NYX

(c) Miranda

(d) Scale-LETKF

(e) QMCPack

(f) Hurricane
Fig. 7: Compression quality evaluation (lower bit rate & higher PSNR better quality). Result for SZ2.1 is omitted since it is very similar to that of SZ3-LR.

(a) Compression

(b) Decompression
Fig. 8: Compression/decompression throughput (MB/s) when relative error bound (error bound normalized to value range) is 1E-3.

Compression Pipeline SZ3-LR:  SZ3-LR is the implementation of the classic compressor SZ2 [22] using SZ3’s modular mechanism, which relies on a multialgorithm predictor for better data correlation. This predictor consists of a Lorenzo predictor and a regression-based predictor and predicts data using the better result in between based on blockwise error estimation. As depicted in Figure 1, it uses a linear-scaling quantizer and a Huffman encoder and the zstd lossless compressor in the other stages.

Compression Pipeline SZ3-Truncation:  SZ3-Truncation is a very fast compression pipeline designed for cases where speed is more important than compression ratio. Given the target bytes as input parameter, it keeps most-significant bytes of each floating-point data while discarding the rest of the bytes. To achieve high compression speed, it bypasses the other stages, which in turn leads to low compression ratios in general cases.

Compression Pipeline SZ3-Interp:  SZ3-Interp has interpolation-based predictors [37]

in its pipeline. Both linear interpolation and cubic spline interpolation are included, and they are better than Lorenzo and regression predictors in many cases for the following reasons. On the one hand, interpolation-based predictors are not affected by the error accumulation effect that is normal in Lorenzo predictor, because the predicted value is based on previous data points in the Lorenzo predictor while it is based on coefficients in interpolation-based predictors. On the other hand, unlike linear regression, which has an overhead to store coefficients, SZ3-Interp has constant coefficients and therefore does not have storage overhead. Similar to SZ3-LR, it uses a linear-scaling quantizer for respecting error bounds, as well as a Huffman encoder and the

zstd lossless compressor for high compression ratios.

We use datasets from five scientific domains : cosmology, climate, quantum structure, seismic wave, and turbulence. The detailed information is shown in Table III.

Application Domain #Fields Dimensions Total Size
HACC Cosmology 6 6.3GB
ATM Climate 77 1.9GB
Hurricane Climate 13 1.2GB
NYX Cosmology 6 3GB
SCALE-LETKF Climate 6 3.2GB
QMCPack Quantum Structure 1 0.6GB
RTM Seismic Wave 3600 635GB
Miranda Turbulence 7 1GB
TABLE III: Dataset Information

We demonstrate the compression quality of the three pipelines using rate-distortion graph in Figure 7. Note that the rate distortion of SZ2.1 is identical to that of SZ3-LR; thus we do not show SZ2.1 in this figure. We observe from Figure 7 that SZ3-Truncation has the lowest compression quality, and this is consistent with its simple byte-truncation design. SZ3-Interp is better than SZ3-LR on most of the datasets, especially on cases with a high compression ratio with a bit rate lower than 3. For example, on the Miranda dataset, under the same PSNR of 90, the compression ratio of SZ3-Interp is 47, and it is 56% higher than the compression ratio of SZ3-LR, which is 30. On the other hand, SZ3-LR is still the best choice on the Scale and Hurricane datasets when high compression accuracy is needed.

The performance evaluation is shown in Figure 8. We include SZ2.1 as the baseline. SZ3-LR-s) is a performance-oriented version of SZ3-LR that shares the same logic but has a different implementation of the predictor module with SZ3-LR. The predictor module in SZ3-LR uses a multidimensional iterator for better code simplicity, and in SZ3-LR-s the predictor contains several codecs, each of which handles data in a specific dimension. Note that SZ3-LR-s still has a modular design and can be customized with different a quantizer, encoder, and lossless compressors. We can see from Figure 8 that SZ3-LR-s has comparable performance with SZ2.1 on all datasets. SZ3-Truncation has the best performance among all compressors including SZ2.1. Its 1GB/s compression throughput is 4X higher than that of the second-best compressor. SZ3-Interp is not as fast as others, but its throughput is still higher than 100 MB/s in all cases.

The quality and performance evaluations reveal the suitable cases for the three built-in pipelines. Specifically, SZ3-Trunction, as a high-speed compressor, is the best choice when there are strict requirements on the compression time, as with some in situ applications. SZ3-Interp would be the first preference in cases where high compression ratio is wanted under relaxed time constraints, such as scientific applications that run for a long time and generate large amounts of data. SZ3-LR has balanced quality and speed; users could choose it as the default compressor in general situations where both high compression ratio and short compression time are needed.

7 Conclusion and Future Work

In this paper, we propose a modular, composable compression framework –SZ3– which allows users to customize on-demand error-bounded lossy compressors in an adaptive and extensible fashion with minimal effort. Using SZ3, we develop efficient error-bounded lossy compressors for two real-word application datasets based on the data characteristic and user requirements, which improve the compression ratios by 20% when compared with other state-of-the-art compressors with the same data distortion. We also compare the sustainability of SZ3 with existing compressors, and leverage it to integrate and evaluate different compression pipelines. In the future, we will integrate more instances to the framework for diverse use cases and provide support for various hardware including GPUs and FPGAs.

Appendix A

We demonstrate some representative interfaces and functions in this appendix. Note that T is the template for data type, N is the template for dimensionality, and X is the template for quantized data type.

a.1 Snippet of Preprocess Interface

template<class T, uint N>
class PreprocessInterface {
  virtual void preprocess(T * data, SZ::Config<T, N>& conf);
  virtual void postprocess(T * data, SZ::Config<T, N>& conf);

a.2 Snippet of Predictor Interface

template<class T, uint N>
class PredictorInterface {
  virtual T predict(const iterator &iter);
  virtual T estimate_error(const iterator &iter);
  virtual uint save(uchar *&c);
  virtual void load(uchar *&c);

a.3 Snippet of Quantizer Interface

template<class T, class X, uint N>
class QuantizerInterface {
  virtual X quantize(T data, T pred);
  virtual T recover(T pred, X quant_value);
  virtual uint save(uchar *&c);
  virtual void load(uchar *&c);

a.4 Snippet of Encoder Interface

 template<class T>
 class EncoderInterface {
  virtual size_t encode(vector<T> &bins, uchar *&bytes);
  virtual vector<T> decode(uchar *&bytes, size_t length);
  virtual uint save(uchar *&c);
  virtual void load(uchar *&c);

a.5 Snippet of Lossless Interface

class LosslessInterface {
    virtual uchar *compress(uchar *data, size_t inSize, size_t &outSize);
     virtual uchar *decompress(uchar *data, size_t& outSize);

a.6 Snippet of Compressor Class

template<class T, size_t N, class Preprocessor, class Predictor, class Quantizer, class Encoder, class Lossless>
class SZ_Compressor {..}

a.7 Snippet of Prediction and Quantization

vector<int> predict_quantize(T *data) {
    multidimensional_iter blocks(data)
    for (auto block = blocks->begin(); block!=blocks->end(); ++block) {
        for (auto element = block->begin(); element != block->end(); ++element) {
            quan=quantizer.quantize(*element, pred);
    return quantization_results;


This research was supported by the Exascale Computing Project (ECP), Project Number: 17-SC-20-SC, a collaborative effort of two DOE organizations – the Office of Science and the National Nuclear Security Administration, responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, to support the nation’s exascale computing imperative. The material was supported by the U.S. Department of Energy (DOE), Office of Science and DOE Advanced Scientific Computing Research (ASCR) office, under contract DE-AC02-06CH11357, and supported by the National Science Foundation under Grant OAC-2003709, OAC-2003624/2042084, SHF-1910197, and OAC-2104023. This research used resources of the Advanced Photon Source, a U.S. Department of Energy (DOE) Office of Science User Facility, operated for the DOE Office of Science by Argonne National Laboratory under Contract No. DE-AC02-06CH11357. We acknowledge the computing resources provided on Bebop, which is operated by the Laboratory Computing Resource Center at Argonne National Laboratory.


  • [1] E. Agustsson, M. Tschannen, F. Mentzer, R. Timofte, and L. V. Gool (2019) Generative adversarial networks for extreme learned image compression. In

    Proceedings of the IEEE/CVF International Conference on Computer Vision

    pp. 221–231. Cited by: §1, §2.
  • [2] M. Ainsworth, S. Klasky, and B. Whitney (2017) Compression using lossless decimation: analysis and application. SIAM Journal on Scientific Computing 39 (4), pp. B732–B757. Cited by: §1, §2.
  • [3] M. Ainsworth, O. Tugluk, B. Whitney, and S. Klasky (2018) Multilevel techniques for compression and reduction of scientific data—the univariate case. Computing and Visualization in Science 19 (5-6), pp. 65–76. Cited by: §1, §2.
  • [4] M. Ainsworth, O. Tugluk, B. Whitney, and S. Klasky (2019) Multilevel techniques for compression and reduction of scientific data-quantitative control of accuracy in derived quantities. SIAM Journal on Scientific Computing 41 (4), pp. A2146–A2171. Cited by: §1, §2.
  • [5] Y. Alexeev, M. P Mazanetz, O. Ichihara, and D. G Fedorov (2012) GAMESS as a free quantum-mechanical platform for drug research. Current topics in medicinal chemistry 12 (18), pp. 2013–2033. Cited by: §1, §4.
  • [6] F. Alted (2017) Blosc, an extremely fast, multi-threaded, meta-compressor library. Cited by: §2, §3.2.
  • [7] Bebop_supercomputer (2019) Note: Available at Online Cited by: §4.3.
  • [8] M. Burtscher and P. Ratanaworabhan (2009-01) FPC: a high-speed compressor for double-precision floating-point data. IEEE Transactions on Computers 58 (1), pp. 18–31. Cited by: §2.
  • [9] F. Cappello, S. Di, S. Li, X. Liang, A. M. Gok, D. Tao, C. H. Yoon, X. Wu, Y. Alexeev, and F. T. Chong (2019-11) Use cases of lossy compression for floating-point data in scientific data sets. 33 (6), pp. 1201–1220. External Links: ISSN 1094-3420, 1741-2846, Document Cited by: §2.
  • [10] Z. Chen, S. W. Son, W. Hendrix, A. Agrawal, W. Liao, and A. Choudhary (2014)

    NUMARCK: machine learning algorithm for resiliency and checkpointing

    In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, New York, NY, USA, pp. 733–744. Cited by: §3.2.
  • [11] S. Claggett, S. Azimi, and M. Burtscher (2018-03) SPDP: an automatically synthesized lossless compression algorithm for floating-point data. In 2018 Data Compression Conference, Vol. , New York, NY, USA, pp. 335–344. External Links: Document, ISSN 2375-0359 Cited by: §2.
  • [12] L. P. Deutsch (1996) GZIP file format specification version 4.3. Cited by: §2, §3.2.
  • [13] S. Di and F. Cappello (2016) Fast error-bounded lossy HPC data compression with SZ. In 2016 IEEE International Parallel and Distributed Processing Symposium, New York, NY, USA, pp. 730–739. Cited by: §1, §1, §2.
  • [14] I. T. Foster et al. (2017) Computing just what you need: online data analysis and reduction at extreme scales. In European Conference on Parallel Processing, Cham, pp. 3–19. Cited by: §1.
  • [15] A. M. Gok, S. Di, A. Yuri, D. Tao, V. Mironov, X. Liang, and F. Cappello (2018) PaSTRI: a novel data compression algorithm for two-electron integrals in quantum chemistry. In IEEE International Conference on Cluster Computing (CLUSTER), New York, NY, USA, pp. 1–11. Cited by: §1, §2, §3.1, §3.2, §3.2, §4.1, §4.
  • [16] D. A. Huffman (1952) A method for the construction of minimum-redundancy codes. Proceedings of the IRE 40 (9), pp. 1098–1101. Cited by: §3.2.
  • [17] L. Ibarria, P. Lindstrom, J. Rossignac, and A. Szymczak (2003) Out-of-core compression and decompression of large n-dimensional scalar fields. In Computer Graphics Forum, Vol. 22, pp. 343–348. Cited by: §3.2.
  • [18] J. Kunkel, A. Novikova, E. Betke, and A. Schaare (2017) Toward decoupling the selection of compression algorithms from quality constraints. In High Performance Computing, J. M. Kunkel, R. Yokota, M. Taufer, and J. Shalf (Eds.), Vol. 10524, pp. 3–14 (en). External Links: Document, ISBN 978-3-319-67629-6 978-3-319-67630-2 Cited by: §2.
  • [19] S. Li, S. Jaroszynski, S. Pearse, L. Orf, and J. Clyne (2019) VAPOR: a visualization package tailored to analyze simulation data in earth system science. Cited by: §2.
  • [20] X. Liang, S. Di, S. Li, D. Tao, B. Nicolae, Z. Chen, and F. Cappello (2019) Significantly improving lossy compression quality based on an optimized hybrid prediction model. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–26. Cited by: §1, §2.
  • [21] X. Liang, S. Di, D. Tao, Z. Chen, and F. Cappello (2018) An efficient transformation scheme for lossy data compression with point-wise relative error bound. In IEEE International Conference on Cluster Computing (CLUSTER), New York, NY, USA, pp. 179–189. Cited by: §1, §2, §3.2.
  • [22] X. Liang, S. Di, D. Tao, S. Li, S. Li, H. Guo, Z. Chen, and F. Cappello (2018) Error-controlled lossy compression optimized for high compression ratios of scientific datasets. In 2018 IEEE International Conference on Big Data, New York, NY, USA, pp. . Cited by: §1, §1, §2, §3.1, §3.2, §3.2, §4.1, §5.2, §6.2.
  • [23] X. Liang, Q. Gong, J. Chen, B. Whitney, L. Wan, Q. Liu, D. Pugmire, R. Archibald, N. Podhorszki, and S. Klasky (2021) Error-controlled, progressive, and adaptable retrieval of scientific data with multilevel decomposition. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–13. Cited by: §4.2.
  • [24] X. Liang, H. Guo, S. Di, F. Cappello, M. Raj, C. Liu, K. Ono, Z. Chen, and T. Peterka (2020) Toward feature-preserving 2D and 3D vector field compression.. In PacificVis, pp. 81–90. Cited by: §1, §2, §3.1, §3.2.
  • [25] X. Liang, B. Whitney, J. Chen, L. Wan, Q. Liu, D. Tao, J. Kress, D. R. Pugmire, M. Wolf, N. Podhorszki, et al. (2021) MGARD+: optimizing multilevel methods for error-bounded scientific data reduction. IEEE Transactions on Computers. Cited by: §1, §3.2.
  • [26] P. Lindstrom and M. Isenburg (2006) Fast and efficient compression of floating-point data. IEEE Transactions on Visualization and Computer Graphics 12 (5), pp. 1245–1250. Cited by: §1, §3.1, §3.2.
  • [27] P. Lindstrom (2014) Fixed-rate compressed floating-point arrays. IEEE Transactions on Visualization and Computer Graphics 20 (12), pp. 2674–2683. Cited by: §1, §2, §4.1, §4.2.
  • [28] P. Lindstrom (2017) Error distributions of lossy floating-point compressors. Joint Statistical Meetings 1 (1), pp. 2574–2589. Cited by: §2.
  • [29] T. Lu, Q. Liu, X. He, H. Luo, E. Suchyta, J. Choi, N. Podhorszki, S. Klasky, M. Wolf, T. Liu, et al. (2018) Understanding and modeling lossy compression schemes on HPC scientific data. In 2018 IEEE International Parallel and Distributed Processing Symposium, pp. 348–357. Cited by: §2.
  • [30] H. Ma, D. Liu, N. Yan, H. Li, and F. Wu (2020) End-to-end optimized versatile image compression with wavelet-like transform. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §2.
  • [31] (2021) Repository of sz2 compresssor. Note: Cited by: §6.1.
  • [32] D. Tao, S. Di, Z. Chen, and F. Cappello (2017) Significantly improving lossy compression for scientific data sets based on multidimensional prediction and error-controlled quantization. In 2017 IEEE International Parallel and Distributed Processing Symposium, New York, NY, USA, pp. 1129–1139. Cited by: §1, §1, §2, §3.1, §3.2, §3.2.
  • [33] D. Taubman and M. Marcellin (2013) JPEG2000 image compression fundamentals, standards and practice. Springer Publishing Company, Incorporated, New York, NY, USA. External Links: ISBN 1461352452, 9781461352457 Cited by: §1, §2.
  • [34] L. Theis, W. Shi, A. Cunningham, and F. Huszár (2017)

    Lossy image compression with compressive autoencoders

    arXiv preprint arXiv:1703.00395. Cited by: §1, §2.
  • [35] R. Underwood, S. Di, J. C. Calhoun, and F. Cappello (2020) FRaZ: a generic high-fidelity fixed-ratio lossy compression framework for scientific floating-point data. Note: Cited by: §2.
  • [36] G. K. Wallace (1992) The JPEG still picture compression standard. IEEE Transactions on Consumer Electronics 38 (1), pp. xviii–xxxiv. Cited by: §1, §2.
  • [37] K. Zhao, S. Di, M. Dmitriev, T. D. Tonellot, Z. Chen, and F. Cappello (2021) Optimizing error-bounded lossy compression for scientific data by dynamic spline interpolation. In 2021 IEEE 37th International Conference on Data Engineering (ICDE), pp. 1643–1654. Cited by: §1, §6.2.
  • [38] K. Zhao, S. Di, X. Liang, S. Li, D. Tao, Z. Chen, and F. Cappello (2020) Significantly improving lossy compression for HPC datasets with second-order prediction and parameter optimization. In Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing, HPDC ’20, New York, NY, USA, pp. 89––100. External Links: ISBN 9781450370523, Document Cited by: §1, §3.2.
  • [39] zstd (2019) Note: Cited by: §2, §3.2.