1 Introduction
Data reduction is becoming increasingly important to scientific research because of the large amount of data produced by simulations running on exascale computing systems and experiments conducted on advanced instruments. For instance, recent climate research, which performs climate simulation in 1 km1 km resolution, generates 260 TB of floatingpoint data every 16 seconds [14]. When the generated data are dumped into parallel file systems or secondary storage systems to ensure longterm access, the limited storage capacity and/or I/O bandwidth will impose great challenges. While scientists aim to significantly reduce the size of their data to mitigate this problem, they are also concerned about the quality of data reduction. General data reduction approaches, including traditional waveletbased methods [36, 33]
and emerging neuralnetworkbased methods
[34, 1] widely used in the image processing community, may lead to loss of important scientific insights as they do not enforce quantifiable error bounds on reconstructed data.Over the past decade, errorbounded lossy compression [13, 32, 22, 38, 27, 26, 2, 3, 4, 25]
has been proposed and employed to reduce scientific data while controlling the distortion. Depending on how the original data are decorrelated, existing compressors can be classified into predictionbased and transformbased. These compressors all allow users to specify an error bound during compression and ensure that the error between original and decompressed data is strictly than the bound. In this paper we focus mainly on predictionbased approaches because transformedbased approaches can be formulated to predictionbased ones by using the corresponding transforms as predictors (at the cost of certain speed degradation), as suggested by prior works
[20].Although existing predictionbased approaches such as SZ [13, 32, 22] are general and can be applied to various scenarios, they may not lead to the best quality and performance given a specific dataset or error bound requirement. The bestfit compression method is never universal, which is true even for the same dataset because the compression efficiency would be affected by the required error bounds as well. For instance, SZ1.4 [32] with a Lorenzo predictor shows very good compression ratios with low error bounds, but it suffers from low quality and artifacts with high error bounds, where approaches with a regressionbased predictor [22]
or an interpolationbased predictor
[37] have been proved to be much more efficient. Likewise, data generated by the GAMESS quantum chemistry package [5] exhibits periodic scaled patterns, where a patternbased predictor demonstrates obvious improvements in both compression speed and ratios [15]. Thus, a loosely coupled compression framework that allows for customization of the predictionbased errorbounded lossy compression model is critical to optimizing the compression quality and performance for users in practice.In this paper, we present a modular and composable framework—SZ3—which can be used to easily create new errorbounded lossy compressors on demand. SZ3 features a modular abstraction for the predictionbased compression pipelines such that modules can be developed and adopted independently. Specifically, users can customize any stages in the compression pipeline, including preprocessing, prediction, quantization, encoding, and lossless compression, via carefully designed modules. Based on these customized modules, SZ3 allows users to compose their own compressors (or compression pipelines) to adapt to diverse data characteristics and requirements, thus achieving high compression quality and performance with minimal effort. Such a composable design is able to provide a variety of useful supports, including pointwise relative error bounds (logarithmic transformbased preprocessor [21]), featurepreserving compression (elementwise quantizer [24]), and speedratio tradeoffs (module bypass). Although designed for data in Cartesian grids, SZ3 can also work with data in unstructured grids by applying a linearization which rearranges data to a onedimensional array.
We summarize our contributions as follows.

We carefully design and develop SZ3, a flexible, efficient framework that allows easy creation and customization of predictionbased errorbounded lossy compressors. This work is critical to obtaining high data compression quality because of diverse scientific data characteristics and user requirmeents in practice.

We develop a new compressor using SZ3 for data generated from GAMESS quantum chemistry package. By substituting the default quantizer with a specialized one and augmenting a lossless compression stage, the composed compressor achieves better performance than current state of the art with minimal effort.

We develop an efficient compressor using SZ3 for data collected from Advanced Photon Source instruments. By incorporating an adaptive pipeline with existing modules, the composed compressor leads to the best ratedistortion under any bit rate.

We compare the sustainability of SZ3 with leading predictionbased compressors, and then integrate several compression pipelines to demonstrate the necessity of diverse pipelines. The performance and efficiency are carefully characterized using diverse scientific datasets across multiple domains.
The rest of the paper is organized as follows. In Section 2 we discuss related work. In Section 3 we present the design and modules of SZ3 framework. In Section 4 and Section 5 we describe how we leverage the proposed framework to create efficient compressors for GAMESS and APS data in details. In Section 6 we present the comparison on sustainability and evaluation for diverse pipelines. In Section 7 we conclude with a vision of future work.
2 Related Work
With more powerful highperformance computing (HPC) systems and highresolution instruments, the volume and generation speed of scientific data have been experiencing an unprecedented increase in recent years, causing problems in data storage, transmission, and analysis. Compared with the fast evolution of computing resources, the I/O systems are heavily underdeveloped, remaining a bottleneck in most scenarios. Data compression is regarded as a direct way to mitigate such a bottleneck, and many approaches have been presented in the literature to address this issue.
Lossless compressors [12, 39, 8, 11, 6] ensure that no information is lost during the compression. Despite their success in many fields, lossless compressors suffer from low compression ratios on floatingpoint scientific data due to the almost randomly distributed mantissas. Previous work [28] has shown that stateoftheart lossless compressors can lead to a compression ratio of only 2 when directly applied to most floatingpoint scientific datasets, whereas scientific applications usually require over reduction on their data [9].
Lossy compressors [36, 33, 34, 1, 19, 30] offer the flexibility to trade off data quality for high compression ratios, but they may result in a higher distortion than users’ expectation. The unbounded distortion may result in unexpected behaviors in post hoc data analytics and even false discoveries, leaving risks in trusting the analysis results on the decompressed data.
In comparison with traditional lossy compression, errorbounded lossy compression has been rapidly developed to fill the gap by reducing the size of scientific data while guaranteeing quantifiable error bounds. Predictionbased and transformbased models are the most popular models for designing errorbounded lossy compressors. One of the most wellknown transformbased errorbounded lossy compressors is ZFP [27], which decorrelates the data using a nearorthogonal transform and encodes the transformed coefficients using embedded encoding. MGARD [2, 3, 4] is another compressor relying on the transformbased model. It leverages wavelet theories and projection for data decorrelation, followed by linearscaling quantization, variablelength encoding, and lossless compression.
According to recent studies [29], SZ [13, 32, 22] is regarded as one of the leading predictionbased lossy compressor in the scientific computing community. SZ follows a 4step pipeline to perform the compression, namely data prediction, quantization, Huffman encoding, and lossless compression. Significant efforts have been made to enable new features or functionalities based on this pipeline. For instance, in [21], a logarithmic transform was used in a preprocessing step to change a pointwiserelativeerrorbound compression problem to an absoluteerrorbound compression problem, which is then solved by the SZ compression pipeline. In [24], the authors derived the elementwise error bounds based on how critical points are extracted, and they leveraged the SZ compression pipeline along with elementwise quantization to ensure that those critical points are preserved in the decompressed data. In [15], the authors adjusted the pipeline by using a patternbased predictor to better exploit the correlation in data and a predefined fixed Huffman tree for faster encoding. Attempts were also made to use the nearorthogonal transform in ZFP as a predictor in the pipeline [20]. All the above works, however, are developed within a tightlycoupled design, so that the compression pipelines cannot be adjusted on demand, which thus cannot adapt to user’s diverse requirements or different usecases in turn. By contrast, the SZ3 framework offers a breakthrough, flexible, modular framework, which can be leveraged to adapt to diverse usecases very efficiently.
Although many efforts have been spent on abstracting lossy compression, most of them are focused on enabling an adaptive selection of existing compressors. For instance, SCIL [18] attempts to abstract across compressors and acts as a metacompressor that provides backends to various existing algorithms. LibPressio [35] provides a common API for different compressors to allow for easy integration of lossy compression in an extensible fashion. Instead, SZ3 separates and abstracts stages in the predictionbased compression model, allowing for easy creation of new compressors in fine granularity rather than selection of existing ones. To the best of our knowledge, this is the first attempt to build a generic framework that allows users to easily customize their own compressors based on their actual needs.
3 SZ3: A Modular Compression Framework
In this section we introduce the design and implementation of SZ3. With modularity in mind, SZ3 enables easy customization of predictionbased compression pipelines with minimal overhead.
3.1 Design overview
Figure 1 illustrates the design overview of SZ3. The compression process is abstracted into five stages (displayed as the dotted boxes), each of which serves as an individual module. Orange boxes depict the key functionalities of each module and green boxes illustrate several corresponding instances. A compressor is realized by identifying a compression pipeline which is composed by instances from each module. This figure demonstrates how five leading compressors designed for different purposes, namely FPZIP [26], SZ1.4 [32], SZ2 [22], SZPastri [15], and cpSZ [24], are composed using this abstraction (see the solid lines), which shows the generality of the abstraction. For instance, the FPZIP compression pipeline bypasses the precessor and leverages Lorenzo predictor for data decorrelation, followed by residual encoding to ensure error control and arithmetic encoding for size reduction. In the following text we will detail the modular design in SZ3, along with example instances of the modules.
3.2 Modularity
In this section we discuss the five modules in SZ3, namely preprocessor, predictor, quantizer, encoder, and lossless compressor, with module instances that have proven to be effective for scientific datasets. Developers can write their own module instances and plug them in the compression pipeline to design predictionbased errorbounded lossy compression for their dataset. Due to space limitation, we present only the most important functions and several representative instances for each module. Detailed interfaces for each module are listed in Appendix A.
Preprocessor (see Appendix A.1): The preprocessor is used to process the input dataset for achieving high efficiency or diverse requirements before performing the actual compression. The key function in the preprocessor, namely preprocess, takes in original data and compression configuration as input, and then transforms the data in an inplace fashion and change the compression configuration accordingly. If users want to keep original data while the preprocessor needs to alter the data, a separate buffer is required to perform the preprocessing. Based on the actual design, the postprocess function either reverses the preprocessing procedure or is omitted.
Instances: A typical preprocessor for errorcontrolled lossy compressors is the logarithmic transform used to enable pointwise relative error bounds [21], where data are transformed to the logarithmic domain and compressed with an absolute error bound transformed from the pointwise relative one. Besides, SZPastri [15] requires a preprocessing step to identify the proper parameters, such as block size and pattern size, for the patternbased predictor. In Section 5, we further leverage a preprocessor to alter the layout of data for better compression ratio. This is based on our observations that some 3D datasets will have a better compression ratio when treated as a 2D or 1D dataset (as will be detailed later).
Predictor (see Appendix A.2): Predictors are the key components of predictionbased compressors, which perform value prediction based on diverse patterns for data decorrelation. There are two important functions in the predictor interface, namely predict and save/load. The predict function outputs the predicted value based on the characteristics of the underlying predictor using the multidimensional iterator (to be detailed in Section 6.1). Necessary information about the predictor, for instance the coefficients of the regression predictor [22, 38], will be recorded in the save function. During decompression, load function will be invoked to reconstruct the predictor.
Instances: Lorenzo predictor [17] and its high order variations [32], which perform multidimensional prediction for each data point based on its neighbor data points, are classic and powerful prediction methods used in lossy compressors such as SZ [32] and FPZIP [26]. In [22]
, a regressionbased predictor is proposed to construct a hyperplane and uses points on the hyperplane as predicted values, which significantly improves the prediction efficiency when userspecified error bound is high. We further implement a composite predictor instance inherited from this interface, which may consist of multiple predictors using different prediction algorithms. This requires an error estimation function for each predictor, which will be used to determine the bestfit predictor for a given data chunk. The statistical approach in
[22] and [25] is generalized as the estimation criterion in SZ3. With the composite predictor, multialgorithm designs with more than one predictors can be implemented very easily.Quantizer (see Appendix A.3): The quantizer is used to approximate prediction errors generated by the predictors with a smaller countable set to reduce their entropy while respecting the error bound. As the only module that introduces errors in the compression pipeline, quantizer determines how the final errors in the decompressed data are controlled. The quantize function is the most important function in a quantizer, where the prediction error is quantized based on the original data value and its predicted value from the predictor. During decompression, the decompressed data value is computed by the recover function, which reverses the steps in the quantize function. The quantizer module is also responsible for encoding/decoding the unpredictable data, i.e., data fall out of the countable set. This is realized in the save/load function.
Instances: Linearscaling quantizer [32] is a widely used quantizer to enable absolute error control in lossy compression. In particular, this quantizer constructs a set of equalsized consecutive bins each with twice the error bound in length. Then, the prediction error will be translated into the index of the bin containing it. Prediction errors that fall out of range are regarded as unpredictable and will be encoded and stored separately. Besides, logscale quantizer [10] is used to adjust the size of bins for a more centralized error distribution and elementwise quantizer [24] is used to provide finegranularity error control for each data point.
Encoder (see Appendix A.4): Encoder is a lossless scheme to reduce the storage of integer indices (or symbols) generated by quantizers. The encoder module involves two essential functions—encode and save/load. The encode function transforms the quantized integers from the quantizer to compressed binary formats; similar to other modules, the encoder module has a decode function which performs the reverse process during decompression. This module also has save/load functions for storing/recovering metadata such as the Huffman tree.
Instances: Huffman encoder [16] is a classic variablelength encoding algorithm that uses fewer bits to represent more common symbols. This encoder first constructs a Huffman tree based on the frequency of input data using a greedy algorithm, generates codebook according to the tree, and then compress the data using the codebook. The fixed Huffman encoder used in SZPastri [15] is a variation of the Huffman encoder, which uses a predefined Huffman tree instead of constructing one on the fly to eliminate the cost for both construction and storage of the tree. Arithmetic encoder is another type of encoder widely used in data compression, which represents current information as range and encodes the entire data into a single number.
Lossless Compressor (see Appendix A.5): Lossless compressors are used to further shrink the size of compressed binary formats produced by the encoders, because the entropybased encoders may overlook repeated patterns in the data thus lead to suboptimal compression ratios. The lossless compressor module in SZ3 acts mainly as a proxy of stateoftheart lossless compression libraries. This module invokes external libraries to compress the output from the encoder module with compress and decompress interfaces.
Instances: We provide portable interfaces in SZ3 to integrate with stateoftheart lossless compressors including ZSTD [39], GZIP [12], and BLOSC [6]. Because lossless compressor is a standlone module attached to the previous stages, it would be fairly easy to include and integrate new lossless compression routines as well.
3.3 Compression pipeline composition
In SZ3, a compression pipeline can be composed by identifying the instances of modules and putting them together. Algorithm 1 shows how a generalpurpose errorcontrolled lossy compressor is composed using the selected preprocessor, predictor, quantizer, encoder, and lossless compressor. In addition, SZ3 employs compile time polymorphism (see Section 6.1) such that users can switch the instances without bothering to modify the compression functions. This makes SZ3 highly adaptive to diverse use cases, with significantly reduced efforts on compressor development.
4 Developing an Efficient Compressor for GAMESS Data using SZ3
In this section, we present how we create a new compressor using SZ3, which can improve the compression ratios for the data generated from the realworld scientific simulation GAMESS [5]. In the following text, we first introduce the GAMESS data and its current compressor — SZPastri [15], and then present our characterization on the quanzation integers and the new customization method. At last, we evaluate the compression ratios and speed based on three representative data fields in GAMESS.
4.1 GAMESS data and SZPastri Compressor
Quantum chemistry researchers often need to obtain a wavefunction by solving the Schrödinger differential equation, which involves all the chemical system’s information. The wavefunction needs to be constructed by twoelectron repulsion integrals (ERI), which requires too large a memory capacity to hold at runtime during the simulation. A straightforward solution is reproducing the ERI dataset whenever needed during the simulation, although this would significantly delay the simulation because of the fairly expensive cost in generating the ERI data. In our prior work, we developed an efficient errorbounded lossy compressor called SZPastri [15], which can compress the ERI data in memory and decompress it in the beginning of each iteration of the simulation. Such a method can effectively avoid the ERI recalculation cost, so as to improve the overall performance. SZPastri takes advantages of the periodic patterns that exist in the GAMESS dataset, because the ERI values are calculated in order and are dependent on shape and distance of electron clouds. Specifically, SZPastri identifies a periodic pattern and uses it along with a scaling coefficient for each block to enable accurate data prediction. This leads to substantial performance gain compared to existing general compressors [22, 27].
4.2 Data characterization and pipeline customization
We first characterize the quantization integers for SZPastri, which are the most impactful factors for the final compression ratios. To enable correct decompression, SZPastri needs to quantize and store the information for both the periodic patterns and blockwise scales. Thus, the quantization integers in SZPastri consist of three components, which are computed from data, patterns, and scales, respectively. As displayed in Figure 3(a), the distribution of quantization integers for the patternbased predictor is centered in , which indicates very high prediction accuracy and thus better compression ratios. However, a significant percentage (20% for data) of the quantization integers fall out of the quantization range ( in this setting). These data, usually described as unpredictable, require additional mechanisms for storage in order to be correctly recovered during decompression. In SZPastri, they are directly truncated and stored based on the userspecified error, which fails to exploit the correlation in the data to achieve high compression, although relatively fast compression speed is provided.
Based on these observations, we improve the compression efficiency of SZPastri by leveraging a specialized quantizer to deal with the unpredictable data. Inspired by the embedded encoding approaches widely used in transformbased compressors [27, 23], we store data in the order of bitplane instead of applying the truncation directly. A bitplane represents a set of bits corresponding to a given bit position in the binary representations of the data. Because small data values have meaningful bits only in less significant bitplanes, the relatively significant bitplanes will yield good compression ratios because of consecutive s. Similar to [27], we first align the exponents of the prediction difference on unpredictable data to that of the error bound to convert the floatingpoint data into integers. These integers are then recorded in the order of bitplanes, namely, from the most significant bitplane to the least significant bitplane. Compared with direct truncation, this encoding method will not change the encoded size at this stage; however, its compressive encoded format will promise better compression ratios when lossless compression is adopted. Since this quantizer takes special care of unpredictable data storage, we name it Unpredaware Quantizer throughput the paper. To take advantage of this method, we also add a lossless stage to the composed compression pipeline, as displayed in Figure 2. This new compressor is called SZ3Pastri, as it optimizes SZPastri using the SZ3 framework.
4.3 Evaluation results
We evaluate our method and compare it with SZPastri and its variation (SZPastri equipped with lossless compression) using three representative fields in GAMESS. Unless otherwise noted, all the experiments in this paper are conducted on the Bebop supercomputer [7] at Argonne National Laboratory. Bebop has 664 Broadwell nodes, each of which is equipped with two Intel Xeon E52695v4 processors containing 36 physical cores in total and 128 GB of DDL4 memory.
The ratedistortion graphs of the evaluation are displayed in Figure 4
. This graph entails the correlation between bit rate and Peak SignaltoNoise Ratio (PSNR). The bit rate equals
where is number of bit in original data representation (e.g., for singleprecision and for doubleprecision floatingpoint data) and is the compression ratio. PSNR is inversely proportional to the mean square error of decompress data and original data in logarithmic scale. Lower bit rate and higher PSNR indicate better compression quality. According to this figure, SZ3Pastri leads to the best ratedistortion along almost all bit rates. For example, the improvements of compression ratios on the dataset are generally and , respectively, compared with SZPastri and its lossless variation. We also show the exact compression ratio and speed of the three approaches under the desired absolute error tolerance (1E10 according to the domain scientists) in Table I. Compared with original SZPastri, SZ3Pastri significantly improves the compression ratios under the requirements. However, it has a degradation in performance, which is caused by the embedded encoding on unpredictable data (i.e., unpredaware Quantizer, which improves the compression ratio) and the final lossless compression.Dataset  Compressor  Ratios  Compression Speed 

SZPastri  8.46  662.01 MB/s  
SZPastriwithzstd  9.27  377.17 MB/s  
SZ3Pastri  10.76  244.43 MB/s  
SZPastri  8.40  643.58 MB/s  
SZPastriwithzstd  9.23  370.88 MB/s  
SZ3Pastri  10.06  221.03 MB/s  
SZPastri  9.14  613.12 MB/s  
SZPastriwithzstd  9.96  364.51 MB/s  
SZ3Pastri  10.71  226.80 MB/s 
5 Composing an Efficient Compressor for APS Data using SZ3
We then leverage our SZ3 framework to create an adaptive compression pipeline for the Xray ptychographic data acquired at the Advanced Photon Source (APS). Similar to the previous section, we first introduce APS data, followed by the data characterization and compression pipeline customization along with the evaluation.
5.1 APS data
Xray ptychography is a main highresolution imaging technique that takes advantage of the coherence provided by the synchrotron source. However, this computational method of microscopic imaging requires much larger data volume and computational resource compared with conventional microscopic techniques. A revolutionary increase of about 3 orders of magnitude in the coherent flux provided by the coming APS upgrade will aggravate the burden of the data transfer and storage. Therefore, a new compression strategy with high compression ratios is being highly pursued in ptychography. In order to represent most sample scenarios, two ptychographic datasets were acquired from a computer chip pillar (isolated sample) and a subregion of an entire flat chip (extended sample), respectively. In both cases, a Dectris Eiger detector (5141030 pixels) was used to acquire diffraction patterns as Xray beam scanned across the sample, and the 2D diffraction images were saved along the time dimension to form a 3D matrix array (195005141030 for chip pillar and 168005141030 for flat chip). In the data analysis, domain experts usually cropped only central region of the diffraction pattern that contains Xray signals (lots of zeros outside this region). To fairly assess our compression strategy without giving an overestimated compression ratio, we cropped only central 256256 pixels.
5.2 Data characterization and pipeline customization
We design an adaptive compression pipeline for APS data based on the following analysis. First, multidimensional Lorenzo predictor introduces higher noise because more decompressed data values are used for prediction [22], even though it is usually superior to the onedimensional one by exploiting the multidimensional correlation. Second, although APS data has three dimensions (e.g., for the chip pillar sample), it is actually a stack of 2D images along the time dimension with relatively low spatial correlation. When the spatial correlation is not strong, the benefit of using the multidimension Lorenzo predictor may not be able to make up the cost for the higher noise. In addition, considering the usually high correlation in time compared with that in spatial region, it might be more effective to compress the data along the time dimension, namely, treating the data as 1D time series. On the other hand, the multidimensional regressionbased predictor should be included because it leverages the multidimensional correlation without being affected by the decompression noise [22], which yields good performance when error bound is relatively high. This requires switching predictors based on the error bound: using a traditional multialgorithm predictor that involves regression for high error bounds and a customized 1D Lorenzo predictor with a transposition preprocessor that reorganizes the data along the time dimension for low error bounds. In our implementation, we switch to the latter along with quantization bin width when the userspecified absolute error bound is less than since this setting generates lossless compression. Under such circumstance, the noise introduced by using decompressed data is reduced to when the unpredaware Quantizer is leveraged, thanks to the restricted quantization bin and the principle of embedded encoding. We further employ a fixed Huffman encoder for fast encoding with comparable compression ratios. The corresponding compression pipeline for APS data is depicted in Figure 5.
5.3 Evaluation results
We evaluate the customized APS compressor and compare it with 3 baselines: the generic SZ2.1 compressor for 1D, 3D, and transposed 1D data. As illustrated in Figure 6, a 3D compressor leads to higher PSNR under low bitrate (high compression ratios), but it suffers when the bitrate increases to a certain level, where there is a sharp increase in the compression quality for 1D compressors. This is caused by the fact that the noise introduced by decompressed data is mitigated with such an error setting in this dataset. SZ2.1 is not aware of this information and incorrectly estimates the Lorenzo prediction noise, leading to the selection of regression predictor even when Lorenzo predictor is better. SZ3APS adaptively chooses the compression pipeline based on the error bound, which leads to comparable performance to that of SZ2.1 for 3D data when error bound is high. Furthermore, the adopted Unpredaware Quantizer exhibits higher compression ratios in low error bound, since it provides nearlossless decompressed data that improves the prediction efficiency of the Lorenzo predictor. In absolute terms, when the decompression data is near lossless (i.e., error bound less than ), the compression ratio gain of the proposed compression pipeline is on chip pillar and on flat chip compared with the second best one. Note that SZ3APS turns out to be lossless in this case, which leads to infinity PSNR in the figure.
6 Sustainability, Quality, and Performance Investigation of SZ3
In this section we first discuss the sustainability of SZ3, and then leverage SZ3 to characterize the quality and performance of diverse compression pipelines.
6.1 Sustainability
We design SZ3 with modularity in mind to allow for a composable framework with high sustainability. Specially, we compare the design of SZ3 with that of SZ2 [31], one of the leading errorcontrolled lossy compressors with predictionbased pipeline, to demonstrate its superiority.
6.1.1 The codebase of SZ2
SZ2 has a large codebase including more than 120 functions with little code reuse, as shown in Table II. For example, SZ2 has separate functions to handle the compression or decompression on a dataset with a specific data type, although the logic to compress and decompress different data types is similar. As a result, SZ2 needs to maintain separate code for each data type.
The lack of software architecture design makes it difficult and timeconsuming to modify and extend the functionality of SZ2. With more than 120 functions to update, some of them are likely to be missed when adding new features to SZ2. Furthermore, the complexity of SZ2 brings challenges to fully validate the correctness of newly added features, because it is timeconsuming to write test code that achieves high code coverage for so many functions of SZ2.
6.1.2 The Codebase of SZ3
We propose three technologies in SZ3 to improve the code sustainability dramatically, namely compiletime polymorphism, datatype abstraction, and multidimensional iterator.
Compile Time Polymorphism: SZ constructs the composed compression pipelines at compile time, because compile time polymorphism provides an efficient way to switch different implementations of modules to avoid runtime performance downgrade. For implementation, the module instances are placed as the template parameters of the compressor (see Appendix A.6). A static assert is executed during the construction of the compressor to ensure that only classes that inherent from specific module interfaces are allowed to be used to initialize template parameters.
Datatype Abstraction: We adopt datatype abstraction to simplify the codebase of SZ3 significantly. Most module interfaces, implementations, and compressor pipelines in SZ3 are designed with datatypes as template parameters for efficient code reuse. By comparison, SZ2 has separate implementations for each datatype, which result in a large code base without code reuse.
Multidimensional Iterator: A multidimensional iterator is designed in SZ3 to support data access patterns of different dimensions. This is totally different from SZ2, where independent implementations are required for each dimensionality. The multidimensional iterator in SZ3 provides a simple API to access the current and nearby data points and move to another position. The boundary situations are handled inside iterators. The iterator design eliminates the need to write separate code based on the data dimensions. The pseudocode of prediction and quantization using the multidimensional iterator is presented in Appendix A.7.
With the multidimensional iterator, the complex nestedloop to iterator through the data and the boundary condition checking are hidden from the users. The multidimensional iterator also supports arbitrary movement. For example, to change a 3D iterator to its upper left neighbor, developers can simply use iterator.move(1, 1, 1) instead of calculating the offset for three dimensions.
6.2 Pipeline integration and evaluation
We integrate three compression pipelines using SZ3 and reveal their suitable cases in terms of quality and performance. Details of the three pipelines are described as follows.
Compression Pipeline SZ3LR: SZ3LR is the implementation of the classic compressor SZ2 [22] using SZ3’s modular mechanism, which relies on a multialgorithm predictor for better data correlation. This predictor consists of a Lorenzo predictor and a regressionbased predictor and predicts data using the better result in between based on blockwise error estimation. As depicted in Figure 1, it uses a linearscaling quantizer and a Huffman encoder and the zstd lossless compressor in the other stages.
Compression Pipeline SZ3Truncation: SZ3Truncation is a very fast compression pipeline designed for cases where speed is more important than compression ratio. Given the target bytes as input parameter, it keeps mostsignificant bytes of each floatingpoint data while discarding the rest of the bytes. To achieve high compression speed, it bypasses the other stages, which in turn leads to low compression ratios in general cases.
Compression Pipeline SZ3Interp: SZ3Interp has interpolationbased predictors [37]
in its pipeline. Both linear interpolation and cubic spline interpolation are included, and they are better than Lorenzo and regression predictors in many cases for the following reasons. On the one hand, interpolationbased predictors are not affected by the error accumulation effect that is normal in Lorenzo predictor, because the predicted value is based on previous data points in the Lorenzo predictor while it is based on coefficients in interpolationbased predictors. On the other hand, unlike linear regression, which has an overhead to store coefficients, SZ3Interp has constant coefficients and therefore does not have storage overhead. Similar to SZ3LR, it uses a linearscaling quantizer for respecting error bounds, as well as a Huffman encoder and the
zstd lossless compressor for high compression ratios.We use datasets from five scientific domains : cosmology, climate, quantum structure, seismic wave, and turbulence. The detailed information is shown in Table III.
Application  Domain  #Fields  Dimensions  Total Size 

HACC  Cosmology  6  6.3GB  
ATM  Climate  77  1.9GB  
Hurricane  Climate  13  1.2GB  
NYX  Cosmology  6  3GB  
SCALELETKF  Climate  6  3.2GB  
QMCPack  Quantum Structure  1  0.6GB  
RTM  Seismic Wave  3600  635GB  
Miranda  Turbulence  7  1GB 
We demonstrate the compression quality of the three pipelines using ratedistortion graph in Figure 7. Note that the rate distortion of SZ2.1 is identical to that of SZ3LR; thus we do not show SZ2.1 in this figure. We observe from Figure 7 that SZ3Truncation has the lowest compression quality, and this is consistent with its simple bytetruncation design. SZ3Interp is better than SZ3LR on most of the datasets, especially on cases with a high compression ratio with a bit rate lower than 3. For example, on the Miranda dataset, under the same PSNR of 90, the compression ratio of SZ3Interp is 47, and it is 56% higher than the compression ratio of SZ3LR, which is 30. On the other hand, SZ3LR is still the best choice on the Scale and Hurricane datasets when high compression accuracy is needed.
The performance evaluation is shown in Figure 8. We include SZ2.1 as the baseline. SZ3LRs) is a performanceoriented version of SZ3LR that shares the same logic but has a different implementation of the predictor module with SZ3LR. The predictor module in SZ3LR uses a multidimensional iterator for better code simplicity, and in SZ3LRs the predictor contains several codecs, each of which handles data in a specific dimension. Note that SZ3LRs still has a modular design and can be customized with different a quantizer, encoder, and lossless compressors. We can see from Figure 8 that SZ3LRs has comparable performance with SZ2.1 on all datasets. SZ3Truncation has the best performance among all compressors including SZ2.1. Its 1GB/s compression throughput is 4X higher than that of the secondbest compressor. SZ3Interp is not as fast as others, but its throughput is still higher than 100 MB/s in all cases.
The quality and performance evaluations reveal the suitable cases for the three builtin pipelines. Specifically, SZ3Trunction, as a highspeed compressor, is the best choice when there are strict requirements on the compression time, as with some in situ applications. SZ3Interp would be the first preference in cases where high compression ratio is wanted under relaxed time constraints, such as scientific applications that run for a long time and generate large amounts of data. SZ3LR has balanced quality and speed; users could choose it as the default compressor in general situations where both high compression ratio and short compression time are needed.
7 Conclusion and Future Work
In this paper, we propose a modular, composable compression framework –SZ3– which allows users to customize ondemand errorbounded lossy compressors in an adaptive and extensible fashion with minimal effort. Using SZ3, we develop efficient errorbounded lossy compressors for two realword application datasets based on the data characteristic and user requirements, which improve the compression ratios by 20% when compared with other stateoftheart compressors with the same data distortion. We also compare the sustainability of SZ3 with existing compressors, and leverage it to integrate and evaluate different compression pipelines. In the future, we will integrate more instances to the framework for diverse use cases and provide support for various hardware including GPUs and FPGAs.
Appendix A
We demonstrate some representative interfaces and functions in this appendix. Note that T is the template for data type, N is the template for dimensionality, and X is the template for quantized data type.
a.1 Snippet of Preprocess Interface
a.2 Snippet of Predictor Interface
a.3 Snippet of Quantizer Interface
a.4 Snippet of Encoder Interface
a.5 Snippet of Lossless Interface
a.6 Snippet of Compressor Class
a.7 Snippet of Prediction and Quantization
Acknowledgments
This research was supported by the Exascale Computing Project (ECP), Project Number: 17SC20SC, a collaborative effort of two DOE organizations – the Office of Science and the National Nuclear Security Administration, responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, to support the nation’s exascale computing imperative. The material was supported by the U.S. Department of Energy (DOE), Office of Science and DOE Advanced Scientific Computing Research (ASCR) office, under contract DEAC0206CH11357, and supported by the National Science Foundation under Grant OAC2003709, OAC2003624/2042084, SHF1910197, and OAC2104023. This research used resources of the Advanced Photon Source, a U.S. Department of Energy (DOE) Office of Science User Facility, operated for the DOE Office of Science by Argonne National Laboratory under Contract No. DEAC0206CH11357. We acknowledge the computing resources provided on Bebop, which is operated by the Laboratory Computing Resource Center at Argonne National Laboratory.
References

[1]
(2019)
Generative adversarial networks for extreme learned image compression.
In
Proceedings of the IEEE/CVF International Conference on Computer Vision
, pp. 221–231. Cited by: §1, §2.  [2] (2017) Compression using lossless decimation: analysis and application. SIAM Journal on Scientific Computing 39 (4), pp. B732–B757. Cited by: §1, §2.
 [3] (2018) Multilevel techniques for compression and reduction of scientific data—the univariate case. Computing and Visualization in Science 19 (56), pp. 65–76. Cited by: §1, §2.
 [4] (2019) Multilevel techniques for compression and reduction of scientific dataquantitative control of accuracy in derived quantities. SIAM Journal on Scientific Computing 41 (4), pp. A2146–A2171. Cited by: §1, §2.
 [5] (2012) GAMESS as a free quantummechanical platform for drug research. Current topics in medicinal chemistry 12 (18), pp. 2013–2033. Cited by: §1, §4.
 [6] (2017) Blosc, an extremely fast, multithreaded, metacompressor library. Cited by: §2, §3.2.
 [7] (2019) Note: Available at https://www.lcrc.anl.gov/systems/resources/bebop Online Cited by: §4.3.
 [8] (200901) FPC: a highspeed compressor for doubleprecision floatingpoint data. IEEE Transactions on Computers 58 (1), pp. 18–31. Cited by: §2.
 [9] (201911) Use cases of lossy compression for floatingpoint data in scientific data sets. 33 (6), pp. 1201–1220. External Links: ISSN 10943420, 17412846, Document Cited by: §2.

[10]
(2014)
NUMARCK: machine learning algorithm for resiliency and checkpointing
. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, New York, NY, USA, pp. 733–744. Cited by: §3.2.  [11] (201803) SPDP: an automatically synthesized lossless compression algorithm for floatingpoint data. In 2018 Data Compression Conference, Vol. , New York, NY, USA, pp. 335–344. External Links: Document, ISSN 23750359 Cited by: §2.
 [12] (1996) GZIP file format specification version 4.3. Cited by: §2, §3.2.
 [13] (2016) Fast errorbounded lossy HPC data compression with SZ. In 2016 IEEE International Parallel and Distributed Processing Symposium, New York, NY, USA, pp. 730–739. Cited by: §1, §1, §2.
 [14] (2017) Computing just what you need: online data analysis and reduction at extreme scales. In European Conference on Parallel Processing, Cham, pp. 3–19. Cited by: §1.
 [15] (2018) PaSTRI: a novel data compression algorithm for twoelectron integrals in quantum chemistry. In IEEE International Conference on Cluster Computing (CLUSTER), New York, NY, USA, pp. 1–11. Cited by: §1, §2, §3.1, §3.2, §3.2, §4.1, §4.
 [16] (1952) A method for the construction of minimumredundancy codes. Proceedings of the IRE 40 (9), pp. 1098–1101. Cited by: §3.2.
 [17] (2003) Outofcore compression and decompression of large ndimensional scalar fields. In Computer Graphics Forum, Vol. 22, pp. 343–348. Cited by: §3.2.
 [18] (2017) Toward decoupling the selection of compression algorithms from quality constraints. In High Performance Computing, J. M. Kunkel, R. Yokota, M. Taufer, and J. Shalf (Eds.), Vol. 10524, pp. 3–14 (en). External Links: Document, ISBN 9783319676296 9783319676302 Cited by: §2.
 [19] (2019) VAPOR: a visualization package tailored to analyze simulation data in earth system science. Cited by: §2.
 [20] (2019) Significantly improving lossy compression quality based on an optimized hybrid prediction model. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–26. Cited by: §1, §2.
 [21] (2018) An efficient transformation scheme for lossy data compression with pointwise relative error bound. In IEEE International Conference on Cluster Computing (CLUSTER), New York, NY, USA, pp. 179–189. Cited by: §1, §2, §3.2.
 [22] (2018) Errorcontrolled lossy compression optimized for high compression ratios of scientific datasets. In 2018 IEEE International Conference on Big Data, New York, NY, USA, pp. . Cited by: §1, §1, §2, §3.1, §3.2, §3.2, §4.1, §5.2, §6.2.
 [23] (2021) Errorcontrolled, progressive, and adaptable retrieval of scientific data with multilevel decomposition. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–13. Cited by: §4.2.
 [24] (2020) Toward featurepreserving 2D and 3D vector field compression.. In PacificVis, pp. 81–90. Cited by: §1, §2, §3.1, §3.2.
 [25] (2021) MGARD+: optimizing multilevel methods for errorbounded scientific data reduction. IEEE Transactions on Computers. Cited by: §1, §3.2.
 [26] (2006) Fast and efficient compression of floatingpoint data. IEEE Transactions on Visualization and Computer Graphics 12 (5), pp. 1245–1250. Cited by: §1, §3.1, §3.2.
 [27] (2014) Fixedrate compressed floatingpoint arrays. IEEE Transactions on Visualization and Computer Graphics 20 (12), pp. 2674–2683. Cited by: §1, §2, §4.1, §4.2.
 [28] (2017) Error distributions of lossy floatingpoint compressors. Joint Statistical Meetings 1 (1), pp. 2574–2589. Cited by: §2.
 [29] (2018) Understanding and modeling lossy compression schemes on HPC scientific data. In 2018 IEEE International Parallel and Distributed Processing Symposium, pp. 348–357. Cited by: §2.
 [30] (2020) Endtoend optimized versatile image compression with waveletlike transform. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §2.
 [31] (2021) Repository of sz2 compresssor. Note: https://github.com/szcompressor/SZOnline Cited by: §6.1.
 [32] (2017) Significantly improving lossy compression for scientific data sets based on multidimensional prediction and errorcontrolled quantization. In 2017 IEEE International Parallel and Distributed Processing Symposium, New York, NY, USA, pp. 1129–1139. Cited by: §1, §1, §2, §3.1, §3.2, §3.2.
 [33] (2013) JPEG2000 image compression fundamentals, standards and practice. Springer Publishing Company, Incorporated, New York, NY, USA. External Links: ISBN 1461352452, 9781461352457 Cited by: §1, §2.

[34]
(2017)
Lossy image compression with compressive autoencoders
. arXiv preprint arXiv:1703.00395. Cited by: §1, §2.  [35] (2020) FRaZ: a generic highfidelity fixedratio lossy compression framework for scientific floatingpoint data. Note: https://arxiv.org/abs/2001.06139Online Cited by: §2.
 [36] (1992) The JPEG still picture compression standard. IEEE Transactions on Consumer Electronics 38 (1), pp. xviii–xxxiv. Cited by: §1, §2.
 [37] (2021) Optimizing errorbounded lossy compression for scientific data by dynamic spline interpolation. In 2021 IEEE 37th International Conference on Data Engineering (ICDE), pp. 1643–1654. Cited by: §1, §6.2.
 [38] (2020) Significantly improving lossy compression for HPC datasets with secondorder prediction and parameter optimization. In Proceedings of the 29th International Symposium on HighPerformance Parallel and Distributed Computing, HPDC ’20, New York, NY, USA, pp. 89––100. External Links: ISBN 9781450370523, Document Cited by: §1, §3.2.
 [39] (2019) Note: https://github.com/facebook/zstd/releasesOnline Cited by: §2, §3.2.