I Introduction
Today’s scientific research applications produce volumes of data too large to be stored, transferred, and analyzed efficiently because of limited storage space and potential bottlenecks in I/O systems. Cosmological simulations [14, 28], for example, may generate more than 20 PB of data when simulating 1 trillion particles over hundreds of snapshots per run. Climate simulations, such as the Community Earth Simulation Model (CESM) [16], may produce hundreds of terabytes of data [11] for each run.
Effective data compression methods have been studied extensively. Since the major scientific floatingpoint datasets are composed of floatingpoint values, however, lossless compressors [24, 42, 13]cannot effectively compress such datasets because of high entropy of the mantissa bits. Therefore, errorbounded lossy compressors have been widely studied because they not only significantly reduce the data size but also prevent data distortion according to a user’s specified error bound. Most existing lossy compressors consider the error bound preeminent and endeavor to improve the compression ratio and performance as much as possible subject to the error bound.
However, many scientific application users have requirements for the compression ratio. These requirements are determined by multiple factors such as the capacity of the assigned storage space, I/O bandwidth, or desired I/O performance. Hence, these users desire to perform fixedratio lossy compression—that is, compressing data based on the required compression ratio instead of only strictly respecting user’s error bound. In this case, the lossy compressor needs to adjust the error bound to respect the target userspecified compression ratio, while minimizing the data distortion. The user can also provide additional constraints regarding the data distortion (such as maximum error bound) to guarantee the validity of the results from reconstructed data. While fixedratio compression can be obtained by simply truncating the mantissa of the floatingpoint numbers, this approach may not respect the user’s diverse error constraints. With such additional constraints, the lossy compressor should make the compression ratio approach the expected level as closely as possible, while strictly respecting the data distortion constraints.
In this paper, we propose a generic, efficient fixedratio lossy compression framework, FRaZ, that is used to determine the error settings accurately for various errorcontrolled lossy compressors, given the particular target compression ratio with a specific scientific floatingpoint dataset. Our design involves two critical optimization strategies. First, we develop a global optimum searching method by leveraging Davis King’s global minimum finding algorithm [18] to determine the most appropriate error setting based on the given compression ratio and dataset in parallel. Second, our parallel algorithm optimizes the parameter searching performance by splitting the search range into distinct regions, parallelizing on file, and—in the offline case—by timestep.
Constructing a generic, highfidelity framework for fixedratio lossy compression poses many research challenges. First, as we elaborate in Section V, the relationship between error bounds and compression ratios is not always monotonic because of the use of dictionary encoder phases in some compressors such as SZ. Second, since we aim to create a generic framework such that more compressors can be included in the future, we cannot utilize properties of the specific compressors we use to optimize performance as has been done in prior work [21, 37] and must instead treat the compression algorithm as a black box. This means that our algorithm cannot take advantage of properties induced by block size or expected behavior induced for a particular data distribution. Third, since we treat the compressors as a black box, we must carefully study how to modify existing algorithms to minimize the calls to the underlying compressors and orchestrate the search in parallel, in order to have a tool that is useful to users while working around the limitations of the various current and potential future compressors.
We perform the evaluation for our framework based on the latest versions of stateoftheart lossy compressors (including SZ [8, 36, 22], ZFP [25], and MGARD [1]), using wellknown realworld scientific floatingpoint datasets from the public Scientific Data Reduction Benchmark (SDRBench) [34]. We perform the parallel performance evaluation on Argonne’s Bebop supercomputer [5] with up to 416 cores. Experiments show that our framework can determine the error setting accurately within the usertolerable errors based on the target compression ratios, with very limited time overhead in realworld cases.
The remainder of the paper is organized as follows. In Section II, we introduce the background of this research regarding various stateoftheart lossy compressors, and we present several examples about user’s requirement on specific compression ratios. In Section III, we compare our work with the related work from fixedrate compression, image processing, and signal processing. In Section IV, we present a formal problem formulation to clarify our research objective. In Section V, we describe our design and our performance optimization strategies. In Section VI, we present the evaluation results. In Section VII, we present or conclusions and end with a vision of future work.
Ii Background
In this section, we describe the research background, including the existing stateoftheart errorcontrolled lossy compressors and fixedratio use cases.
Iia ErrorBounded Lossy Compression
IiA1 Sz
SZ has been widely evaluated in the scientific floatingpoint data compression community [4, 20, 40, 26] , showing that it is one of the best compressors in its class.
SZ is designed based on a blockwise predictionbased compression model. It splits each dataset into many consecutive nonoverlapped blocks (such as 666 for a 3D dataset) and performs compression in each block. It includes four key steps:

Step 1: data prediction. SZ adopts a hybrid data prediction method (either a 1layer Lorenzo predictor [15]
or linear regression method) to predict each data point by its neighboring values in the multidimensional space.

Step 2: linearscaling quantization. Each floatingpoint data value is converted to an integer number in terms of the formula , where refers to the userspecified error bound (i.e., linearscaling quantization).

Step 3: entropy encoding. A Huffman encoding algorithm customized for integer code numbers is then applied to the quantization codes generated by Step two.
SZ allows one to set an absolute error bound to control the data distortion in the compression.
IiA2 Zfp
ZFP [25] is another outstanding errorcontrolled lossy compressor and is also broadly assessed and used in the scientific floatingpoint data compression community. ZFP transforms floatingpoint data to fixedpoint values block by block (block size is 44 for 3D datasets) and adopts an embedded coding to encode the generated coefficients.
ZFP also provides an absolute error bound to control the data distortion. Although ZFP provides another fixedrate compression mode, allowing users to do the data compression based on a given compression ratio, the compression quality is significantly worse than the absolute errorbound mode. We demonstrate this in Section VIB4. Thus, how to efficiently fix compression ratio based on absolute errorbound mode is critical to ZFP.
IiA3 Mgard
MultiGrid Adaptive Reduction of Data (MGARD) [1] is an errorcontrolled lossy compressor supporting multilevel lossy reduction of scientific floatingpoint data. MultiGrid is designed based on the theory of multigrid methods [39, 31]. An important feature of MGARD is providing guaranteed, computable bounds on the loss incurred by the data reduction. MGARD provides different types of norms, such as infinity norm and L2 norm, to control the data distortion. The infinity norm is equivalent to the absolute error bound, and the L2 norm mode can be used to control the mean squared error (MSE) during the lossy compression.
Although SZ, ZFP, and MGARD provide advanced features to control the distortion of lossy compression, none of them provide highfidelity fixedratio compression.
Accuracy of EBLC is dictated at compression time by selection of an error bound and error bounding type — e.g., absolute, relative, number of bits and are selected to minimize impact on quantities of interest in scientific simulations. For use cases that preform data analytics on lossy compressed data, trialanderror is often used to identify acceptable compression tolerances [3]. The trialanderror is often done offline to ensure that the selected error bound is robust for multiple timesteps and does diminish the quality of the analysis. However, if the lossy compressed data is used to advance the simulation the simulation trialanderror is possible [33], but recent works have explored the relation of compression error to numerical errors present in the simulation and provide strategies on error tolerance selection [26, 6, 9, 38].
IiB FixedRatio Use Cases
In this subsection, we describe several fixedratio use cases to demonstrate the practical demands on the fixedratio compression by realworld application users.
The first use case is significant reduction of storage footprint. On the ORNL Summit system, for example, the capacity of the storage space is limited to 50 TB for each project by default. Many scientific floatingpoint simulations (such as the CESM climate simulation and HACC cosmological simulation) may produce hundreds of terabytes of data in each run (or even over 1 PB of data), such that the compression ratio has to be 10:1 or higher to avoid execution crash due to no space being left on storage. Even if a larger storage allocation is awarded or purchased at considerable financial cost, projects generating extreme volumes often face the need to reduce their storage footprint in order to make room for their next executions. Fixedrate compression provides the ability to store multiple simulations given a fixed amount of storage but suffers from large inaccuracies in the data which high compression ratios are required (see Figure 10).
The second practical use case explores bestfit lossy compression solutions based on the user’s postanalysis requirement (such as visual quality or specific analysis property) by at fixed compressed sizes. None of the existing errorcontrolled lossy compressors provide the fixedratio compression mode, however, and therefore users have to seek the bestfit choice by conducting inefficient trialanderror strategies with different error settings for each compressor to achieve a target compression ratio. Furthermore, there is no universal model that accurately predicts compression ratio based on compressor configuration for a variety of input data.
The third practical use case involves the matching of I/O bandwidth constraints and accelerating the I/O performance. Advanced lightsource instruments, such as the Advanced Photon Source and Linac Coherent Light Source (LCLSII), may generate image data at an extremely high acquisition rate, such that the raw data cannot be stored efficiently for postanalysis because of limited I/O bandwidth. Specifically, LCLSII is producing instrument data with up to 250 GB/s while the corresponding storage bandwidth is only 25 GB/s. Thus the designers of LCLSII expect to reduce the data size with a compression ratio of 10 or higher [7]. Spring8 [35] researchers also indicate that their data could be generated with 2 TB/s, which is expected to be reduced to 200 GB/s after data compression.
We note that users often require randomaccess decompression across time steps, which means that they prefer to be able to decompress the data individually at each timestep because decompressing the whole dataset with all timesteps requires a significant amount of time or is impossible because of memory allocation limits.
Iii Related Work
In the compression community, the similar type of compression is called “fixedrate compression,” where the rate here refers to the bit rate, which is defined as the number of bits used to represent one symbol (or data point) on average after compression. The lower the bit rate, the higher the compression ratio. Hence, fixing the bit rate means fixing the compression ratio. In the remainder of this section, we compare our work with prior work in these areas.
In addition to the fixedaccuracy modes (i.e., accuracy and precision errorbounding modes), ZFP offers a fixedrate mode [25]. The fixedrate mode of ZFP offers precise control over the number of bits per symbol in the input data. It operates by transforming the data into a highly compressible domain and truncating each symbol to reach the appropriate rate. However, the fixed rate mode of ZFP is not error bounded, and it suffers from significantly lower compression quality than does the fixedaccuracy mode of ZFP. Figure 1 (b) demonstrates the compression quality of ZFP using fixedaccuracy mode and fixedrate mode, respectively. One can clearly see that the latter exhibits much worse rate distortion than does the former (up to 30 dB difference with the same bitrate in most cases). Rate distortion is an important indicator to assess the compression quality. Its detailed definition can be found in Section VIB (4). Figures 1
(c) and (d) clearly show that the fixedrate mode results in much lower visual quality (e.g., more data loss) than the fixedaccuracy mode with the same compression ratio, 50:1 in this example. In absolute terms, the fixedaccuracy mode leads to much higher peak signaltonoise ratios (PSNR) and lower autocorrelation of compression errors (ACF(error)), which means better compression quality than the fixedrate mode. In ZFP’s website and user guide, the developer of ZFP also points out that the fixed rate mode is not recommended unless one needs “to bound the compressed size or need random access to blocks”
[25].In contrast, not only does our framework fix the compression ratio but it also achieves higher compression quality for different compressors (such as SZ, ZFP, and MGARD) based on their errorbounding mode. Additionally, it can provide random access to the same level as can ZFP’s fixed rate mode when supported by the underlying compressor. Since our framework utilizes a control loop to bound the compression ratio, it may suffer a lower bandwidth than ZFP’s fixedrate mode to a certain extent. The tradeoff for this lower bandwidth is compressed data of far higher quality for the same compression ratio, which we demonstrate in detail in the evaluation section.
The literature also includes studies investigating the use of fixedratio compressors for images. One such work [17]
is JPEGLS, a fixedratio compressor for images. This work adopts a combination of a prediction system for data values and two runs of GolumbRice encoding to encode RGB values for images. The first run of the GolumbRice encoding is used to estimate the quantization level used in the the second run of the encoder. GolumbRice assumes integer inputs, whereas our work is applicable on all numeric inputs.
Some work also has been done on fixedratio compression in digital signal processing [2]
In this domain, adaptive sampling techniques are used to maintain a budget for how many points to transmit. When a point is determined to provide new information (using a predictor, interpolation scheme, or some other method), it is transmitted and the budget is expended as long as there is remaining budget. If the budget is spent, then no points are transmitted until the budget is refilled. Over time, the budget is increased to keep the rate constant. In contrast, our work does not rely on a control loop to maintain the error budget, so can look at the data holistically to decide where to place the loss in our signal, allowing for more accurate reconstructions.
Iv Problem Formulation
In this section, we formulate the research problem, by clarifying the inputs, constraints, and the target of our research.
Before describing the problem formulation, however, we introduce some related notations as follows. Given a specific field at timestep of an application, we denote the dataset by = , where refers to the original value of data point in the dataset and is the number of elements. We denote its corresponding decompressed dataset by = , where refers to the reconstructed value after the decompression. We denote the original data size and compressed data size by and , respectively. The compression ratio (denoted by ) then can be written as = . Moreover, we denote the target compression ratio specified by the users as , and the real compression ratio after the compression as .
The fixedratio lossy compression problem is formulated as follows, based on whether it is subject to an errorcontrol constraint or not.

Nonconstrained fixedratio compression: The objective of the nonconstrained fixedratio lossy compression is to confine the real compression ratio to be around the target compression ratio within a userspecified tolerable error (denoted by ), as shown below.
(1) 
Errorcontrolbased fixedratio compression: The objective of the errorcontrolbased fixedratio compression is to tune the compression ratio to be within the acceptable range [,], while respecting the userspecified error bound (denoted by ), as shown below.
(2) where is a function of error control. For instance, = for the absolute error bound, and = for the mean squared error bound.
We summarize the key notation in Table I.
V Design and Optimization
In this section, we present the design of our fixedratio lossy compression framework and optimization strategies.
Va Design Overview
Figure 2 shows the design overview with highlighted boxes indicating our major contributions in this paper and the relationship among different modules in the framework. As shown in the figure, our FraZ framework is composed of five modules, and the optimizing autotuner and parallel orchestrator are the core modules. They are, respectively, in charge of (1) searching for the optimal error setting based on the target compression ratio with few iterations and (2) parallelizing the overall tuning job involving different searching spaces for each field and different timesteps and across various fields. We develop an easytouse library (called Libpressio [23]) to build a middle layer for abstracting the discrepancies of the APIs of different compressors.
We list our major contributions as follows:

Formulated fixedratio compression as an optimization problem in a way that converges quickly without resorting to multiobjective optimization

Evaluated several different optimization algorithms to find one that works on all of our test cases, and then modified it to improve performance for our FRaZ

Implemented and ran parallel search to improve the throughput of the technique
VB Autotuning Optimization
In this subsection, we describe our autotuning solution in detail, which includes three critical parts: (1) exploration of the initial optimization methods, (2) construction of a loss function, and (3) improvements to the optimization algorithm that involves how to deal with infeasible target compression ratio requirement and determines the exact errorbound setting.
VB1 Exploration of Initial Optimization Methods
In this subsection we describe how we choose which optimization method to use as a starting point for later refinement.
Before detailing FRaZ’s optimizing autotuning method, we first analyze why the straightforward binary search is not suitable for our case. On the one hand, the application datasets may exhibit a nonmonotonic compression ratio increase with error bounds. We present a typical example in Figure 3, which uses SZ to compress the QCLOUDf field of the hurricane simulation dataset. We can clearly see that the compression ratios may decrease significantly with larger error bounds in some cases. We also observe the spiky changes in the compression ratios with increasing error bounds on other datasets (not presented here due to space limit). The reason is that SZ needs to use decompressed data to do the prediction during the compression, which may cause unstable prediction accuracy. Moreover, SZ’s fourth stage (dictionary encoder) may find various repeated occurrences of bytes based on output of the third stage, because a tiny change to the error bound may largely affect the Huffman tree constructed in the third phase of SZ. By comparison, our autotuning search algorithm is a generalpurpose optimizer and takes into account the irregular relationship between compression ratios and error bounds. On the other hand, even on the datasets where monotonicity holds, binary search may still be slower than FRaZ’s optimizing autotuner. For example, when searching for the target compression ratio 8:1 at the timestep on the HurrianeCLOUD field, our method requires only 6 iterations to converge to an acceptable solution, whereas binary search needs 39 iterations. The reason is that binary search may spend substantial time searching small error bounds, which would not result in an acceptable solution because it climbs from the minimum possible error bound to the userspecified upper limit.
When developing FRaZ’s optimizing autotuner, we considered a number of different techniques to perform the tuning. Since we are developing a generic method, we cannot construct a general derivative function that relates the change in error bound to the change in compression ratio. Therefore, we need to decide between methods that use numerical derivatives and derivativefree optimization because the derivative of the compression ratio with respect to the error bound is unknown. The methods using numerical derivatives approximate the slope of the objective function by sampling nearby points. Some methods that fall in this category are gradient descent (i.e., Newtonlike methods such as [10] and ADAM [19]). However, when evaluating an error bound to determine the compression ratio, we must run the compressor since we are using the compressors as black boxes, which may take a substantial amount of compared with the optimization problem. In this sense, numerical derivativebased methods are too slow.
We therefore turned our consideration to derivativefree optimization. We considered methods such as BOBYQA [30], but they do not handle a large number of local optimums. This ability is essential for developing a robust tuning framework for lossy compression because many of the functions that relate error bounds to compression ratios look like the plot on the left of Figure 4: a steplike function with perhaps a slight upward slope on each step. In practice, we noted that it is easily able to escape the local optima in these functions.
We also took into account a variety of implementations of these algorithms, such as the ones in [32, 12, 18]. We decided between these libraries using three criteria: (1) correctness of the result, (2) time to solution, and (3) modifiability and readability of code. Ultimately we started with a blackbox optimization function called find_global_min from the commonly used Dlib library from which we make our modifications [18]. The globalminimumfinding algorithm designed by Davis King that combines the works of [27] and [29]
. It requires a deterministic function that maps from a vector to a scalar, a vector of lower bounds, and a vector of upper bounds as inputs. At a high level the algorithm works as follows. It begins with a randomly chosen point between the upper and lower bounds. Then, it alternates between a point chosen by the model in
[27], which approximates the function by using a series of piecewise linear functions and chooses the global minimum of this function, and the model in [29], which does a quadratic refinement of the lowest valley in the model. According to [18], this method performs well on functions with a large number of local optimums, and this performance was confirmed by our experience.VB2 Construction of Loss Function
Now that we have an optimizer framework, we need to construct a loss function. First, we created a closure for each compressor, that transformed its interface including a dataset and parameters in a function accepting only the error bound . To create the closure, we developed libpressio [23]—a generic interface for lossy compressors that abstracts between their differences so that we could write one implementation of the framework for SZ, ZFP, and MGRAD.
To convert this to a loss function, we chose the distance between the measured compression ratio and target compression ratio . Now, the function that relates an error bound to a compression ratio is an arbitrary function that may or may not have a global or local optimum. Therefore, we transformed the function by applying a clamped square function (i.e. , where is equal to 80% of the maximum representable double using IEEE 754 floatingpoint notation). This maps the possible range of the input function from the range to the range . The benefits of this are twofold. First, the function now has a lowest possible global minimum we can optimize for. Second, the function now has a highest possible value that avoids a bug in the Dlib find_global_min function that causes a segmentation fault. We also considered the function , but found that the quadratic version converged faster. This leaves us with the final optimization function .
VB3 Development of Worker Task Algorithm
Our next insight was that often the exact match of the compression ratio is not always feasible and is neither desired nor required. It may not always be feasible because for some compressors, for example ZFP’s accuracy mode, the function that maps from the error bound to the compression ratio is a step function, such that not all compression ratios are feasible. In addition, it may not always be desired or required because the user might accept a range of compression ratios and prefer finding a match quickly rather than waiting for a more precise match.
Looking again at Figure 4, we see a typical relationship between an error bound and the compression ratio. If the user asks for a compression ratio of 15, no error bound would satisfy that request using this compressor. In contrast, FRaZ will return the closest point that it observes to the target; in the case of Figure 4 it would report an error bound that results in a compression ratio near 17.5. Depending on the user’s global error tolerance, this value near 17.5 may or may not be within the user’s acceptable region, meaning it may or may not be a feasible solution.
Another case that the solution may be infeasible is when needed error bound required to meet the objective is above the user’s specified upper error bound, . In this case, FRaZ will report the error bound that resulted in the closest that it observed to the target compression ratio, and the user can run FRaZ again with the default upper bound, which is equal to the maximum allowed level of an error bound by the compressor. If FRaZ identifies a solution in this case, the user can evaluate whether to relax the perhaps overly strict error tolerance to meet the objective or decide that the fidelity of the results is more important and that the bound cannot be relaxed. Alternatively, the user can try a different compressor backend that implements the same error bound.
In fact, determining the exact error bound that produces a specified compression ratio may not be desired or required. The reason is that a large number of iterations may be needed in order to converge to an error bound, and the user would rather trade time for accuracy. Therefore, we implemented a version of Dlib’s find_global_min that implements a global cutoff parameter . Specifically, we allow the algorithm to terminate if the result of the optimization function results in a value in the range: . This has a substantial impact on the performance on the typical case.
We combine these insights into our worker task algorithm, as shown in Algorithm 1.
VC Parallelism Scheme
After optimizing the serial performance via the design of the optimization algorithm, we develop a parallel optimization method using Dlib’s builtin multithreaded optimization mode. Some compressors (such as SZ and MGARD) do not support being run with different settings in a multithreaded context because of the use of global variables. In this situation, we can only treat each compression as a nonmultithreaded task because we are developing a generic framework.
We use multiple processes based on MPI to parallelize the search by error bound range. Figure 5 provides an overview of our method. Rather than a serial search over the entire lower to upper bound range, we divide the range into overlapping regions. We then give each of the regions to separate MPI processes, and use Algorithm 2 to process them. As the processes complete, we test whether we have satisfied our objective subject to our global threshold (line 7–9). If so, we terminate all tasks that have not yet begun to execute (line 10–14). If a particular task finishes and we have not satisfied our objective, we do nothing. If all the tasks finish and we still have not met our objective, we conclude that the requested compression ratio is infeasible (line 18–25).
So why do we overlap the error bound regions? Overlapping the regions avoids extremelylong worstcase search time in the optimization algorithm. Since we terminate early once a solution is found, FRaZ’s runtime depends on the region containing the target. Without small overlapping, if the target errorbound coincides a border, its MPI_rank iterates longer lacking stationarypoints for quadratic refinement.
To limit the effects of waiting on wrong guesses, we constrain the number of iterations to a maximum value. We considered limiting by time instead, but we were unable to find a heuristic that worked well across multiple datasets, fields, and timesteps. This is because the compression time is a function of the dataset size, the entropy of the data contained within, and properties of each compressor.
Limiting the amount of wasted computational resources is desirable. Since we are dividing on errorbound range, a small number of the searches (typically one) are expected to return successfully if the requested ratio is feasible. Additionally, there seems to be a floor for how many iterations are required to converge for a particular mode of a compressors. Hence, there is limited benefit to splitting into more than a few ranges, and cores could perhaps be more efficiently used for other fields. Preliminary experiments found that 12 tasks per a particular field and timestep dataset offered an ideal tradeoff between efficiency and runtime, and we set it as the default. The user can choose to use more tasks, however.
One can also perform additional optimization of multiple timestep data. Often, subsequent iterations in a large simulation do not differ substantially and have similar compression properties. Therefore we ran the first timestep as before, but then we assumed that the error bound found by the previous iteration was correct for the next full dataset. If our assumption proved correct, we continued on and skipped training. Otherwise, we reran the training and adopted the new trained solution for the next step. We then repeated this process over the remaining datasets. In practice, we retrained only a small percentage of the time. On the hurricane dataset, for example, we retrained only 4 times on the CLOUD field.
We also take advantage of the embarrassingly parallel nature of parallelizing by fields, as shown in Algorithm 3. The results show some additional speedup.
Vi Performance and Quality Evaluation
In this section, we first describe our experimental setup, including hardware, software, and datasets. We then describe our evaluation metrics and results using five realworld scientific floatingpoint datasets on Argonne’s Bebop supercomputer
[5].Via Experimental Setup
ViA1 Hardware and Software Used for Evaluation
We have packaged our software as a Singularity container for reproducibility.
ViA2 Datasets used for Experiments
In our experiments, we evaluated our designed fixedratio lossy compression framework based on all three stateoftheart compressors described in Section II, using five realworld scientific simulation datasets downloaded from scientific data reduction benchmark [34]. The raw data are all stored in the form of singleprecision data type (32bit floating point). We describe the five application datasets in Table III.
We chose these datasets for a few reasons: First, they offer results over multiple timesteps, which matches well user’s practical postanalysis with a certain simulation period. Second, the datasets use floatingpoint data which are often not served well by traditional lossless compressors. Third, the datasets are commensurate with the use cases of fixedratio compression described in Section II.
In some cases, we are not able to use all the datasets with all compressors. We run all the experiments for all datasets and compressors when possible. MGARD supports only 2d and 3d data so it is not tested on the HACC and Exaalt datasets. We adopt 6 typical fields for CESM application because other fields exhibit similar results with one of them (CLDHGH CLDLOW, CLOUD, FLDSC, FREQSH, PHIS). We generally noted similar results for each dataset and compressor.
ViB Experimental Results
Over the course of our experiments, we evaluated four properties of FRaZ using the datasets from SDRBench [34]:

How close do we get to the target compression ratio when it is feasible?

How long does it take to find the target compression ratio or determine that it is infeasible?

How does the runtime of the algorithm scale as the number of cores increase?

How does FraZ compare with existing fixedrate methods in terms of rate distortion and visual quality?
ViB1 How close do we get to the target compression ratio?
How close we get to the target compression ratio depends heavily on whether the requested compression ratio is feasible for the underlying compressor used. Figure 6 (a) and Figure 6 (b) show a bad case and a good case, respectively.
In Figure 6 (a), we see an example of where is infeasible for most timesteps for the CLOUD field. The early timesteps compress within the acceptable range, but by timestep ten the is no longer feasible. The reason is that as the timesteps progress, the properties of the dataset change, affecting the ability of compressor to compresses it at this level. As a result, we oscillate between a compression ratio that is larger and a compression ratio that is smaller. However, a larger tolerance (i.e., ) would have allowed even this case to converge for all timesteps.
In Figure 6 (b), we see an example of where the algorithm converges on over 90% of the timesteps. In this case, we quickly converge to the acceptable range and are able to often reuse the previous time steps error bound for future iterations. In this particular case, we have to retrain only four times over the course of the simulation on iterations: 0, 8, 15, 29. Thus, the algorithm can quickly process many timesteps.
ViB2 How long does it take to reach the target compression ratio?
When evaluating the algorithm, we wanted to consider how long the algorithm takes to find the target compression ratio. This again depends greatly on whether is feasible or not. Therefore, we considered a large number of possible ’s for different datasets. The results of this search are shown in Figure 7. We can see that some compression ratios require far longer total times. Figures 6 (a) and (b) show a zoomed in view of and . The difference in runtime is explained by the difference in the number of timesteps that converge. In the case shown in Figure 6 (a) relatively few timesteps converged because the objective was infeasible with the specified compressor; in the case shown in Figure 6 (b) almost all timesteps converged because the objective was feasible. This resulted in about a 10x difference in performance between the two cases.
Why do low target compression ratios have long runtimes? Many of the lossy compressors have an effective lower bound for the compression ratio. In Figure 7, it is about 7.5. This effective lower bound on the compression ratio, means that FRaZ will never meet its objective and spends the remainder of the time searching until it hits its timeout.
How does this change across datasets? In general, the more feasible compression ratios near the target, the better FRaZ preformed. Each dataset had a compressor which was able to more accurately compress and decompress the data.
How does this change between compressors? Generally SZ took less time than ZFP or MGARD even though ZFP may take less time for each compression. This is because ZFP typically had fewer viable compression ratios than SZ due to limitations of ZFP’s transform based approach. As a result, FRaZ took more timesteps which took the maximum number of iterations lengthening the total runtime. The difference in runtime between SZ and ZFP for a representative dataset can be seen in Figure 8 below.
ViB3 How does the algorithm scale?
To evaluate how well the algorithm scales, we considered the runtime of the algorithm as it scales over multiple cores on ANL Bebop [5].
Figure 8 shows the strong scalability of the algorithm. We see that the algorithm scales by timestep and field levels for the first 180–216 cores with steep decreases in runtime due to parallelism at early levels and then less additional parallelism after that. This is because the runtime of the algorithm is lower bounded by the longest running worker task. All of the datasets we tested has at least one fields that takes substantially longer to compress than others. And the scalability of this algorithm is limited by the longest of these. In the case of the Hurricane dataset using the errorbounded compressor SZ, the QCLOUD field took 1022 seconds to compress while the 75 percentile is less than 500 and the 50 percentile is less than 325.
What accounts for the substantial difference between the scalability of FRaZ using ZFP and SZ? Those familiar with ZFP likely know that it is typically faster than SZ, but this seems to contradict the result in Figure 8. This result is explained by considering the individual fields rather than the overall scalability. For the cases in which ZFP finds an error bound that satisfies the target compression ratio, it is much faster. However ZFP often expresses fewer compression ratios for the same error bound range, resulting in more infeasible compression ratios and thus increasing the runtime. ZFP expresses few compression ratios because it uses a flooring function in the minimum exponent calculation used in fixedaccuracy mode.
Fields may take longer for a variety of reasons: (1) the may not be feasible for one or more of the timesteps, (2) the dataset may have higher entropy resulting in a longer encoding stage for algorithms such as SZ, or (3) the fields may be of different sizes, and larger fields take longer.
ViB4 How does FRaZ compare with the existing fixedrate compression methods in terms of rate distortion and visual quality?
We present the rate distortion in Figure 9, which shows the bit rate (the number of bits used per data point after the compression) versus the data distortion. Peak signaltonoise ratio (PSNR) is a common indicator to assess the data distortion in the community. PSNR is defined as where , and and refer to the max and min value, respectively. In general, the higher the PSNR, the higher the quality of decompressed data.
In this figure, one can clearly see that ZFP (FRaZ) provides consistently better rate distortion than does ZFP (fixedrate) across bitrates (i.e., across compression ratios). Moreover, SZ (FRaZ) exhibits the best rate distortion in most cases, which is consistent with the high compression quality of SZ as presented in our prior work [22]. That being said, FRaZ can maintain the high fidelity of the data very well during the compression, by leveraging the errorbounded lossy compression mode for different compressors.
In addition to the rate distortion, we present in Figure 10 visualization images based on the same target compression ratio, to show fixedratio compression approach preserves visual quality. We wanted to set compression ratio of 100:1, but the closest fesible compression ratio for ZFP is 85:1 (see Section VIB). Hence, we set the target compression ratio to be 85:1 for all compressors. Because of the space limit, we present the results only for NYXtemperature field; other fields/applications exhibit similar results. All the results are generated by FRaZ except for the ZFP(fixedrate). ZFP(FRaZ) exhibits a much higher visual quality than does ZFP(fixedrate) (see Figure 10 (b) vs. Figure 10 (c)), because FRaZ tunes the error bound based on fixedaccuracy mode, which has a higher compression quality than ZFP’s builtin fixedrate mode. ZFP(FraZ) exhibits higher PSNR than does ZFP(fixedrate), which means higher visual quality. We also present the structural similarity index (SSIM)[41] for the slice images shown in the figure. SSIM indicates similarity in luminance, contrast, and structure between two images; the higher SSIM, the better. Our evaluation shows that ZFP(fixedrate) has lower SSIM than ZFP(FRaZ) – i.e. better quality. From among all the compressors here, MGARD(FRaZ) leads to the lowest visual quality (as well as lowest PSNR and SSIM), because of inferior compression quality of MGARD on this dataset.
Vii Conclusions and Future Work
We have presented a functional, parallel, blackbox autotuning framework that can produce fixedratio errorcontrolled lossy compression for scientific floatingpoint HPC datasets. Our work offers improvements over existing fixedrate methods by better preserving the data quality for equivalent compression ratios. We showed that FRaZ works well for a variety of datasets and compressors. We discovered that FRaZ generally has lower runtime for dataset and compressor combinations that produce large numbers of feasible compression ratios.
A number of areas for potential improvement exist. First, we would like to consider arbitrary user error bounds. By user error bounds, we mean error bounds that correspond with the quality of a scientist’s analysis result relative to that on noncompressed data, such as [3] which identifies a particular SSIM in lossy compressed data required for valid results in their field. Second, we would like to develop an online version of this algorithm to provide in situ fixedratio compression for simulation and instrument data. Third, we would like to further improve the convergence rate of our algorithm to make it applicable for more use cases.
Acknowledgment
This research was supported by the Exascale Computing Project (ECP), Project Number: 17SC20SC, a collaborative effort of two DOE organizations  the Office of Science and the National Nuclear Security Administration, responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, to support the nation’s exascale computing imperative.
The material was supported by the U.S. Department of Energy, Office of Science, under contract DEAC0206CH11357, and supported by the National Science Foundation under Grant No. 1619253 and 1910197.
We acknowledge the computing resources provided on Bebop, which is operated by the Laboratory Computing Resource Center at Argonne National Laboratory.
This material is also based upon work supported by the U.S. Department of Energy, Office of Science, Office of Workforce Development for Teachers and Scientists, Office of Science Graduate Student Research (SCGSR) program. The SCGSR program is administered by the Oak Ridge Institute for Science and Education (ORISE) for the DOE. ORISE is managed by ORAU under contract number DESC0014664. All opinions expressed in this paper are the author’s and do not necessarily reflect the policies and views of DOE, ORAU, or ORISE.
References
 [1] (20181201) Multilevel techniques for compression and reduction of scientific data—the univariate case. Computing and Visualization in Science 19 (5), pp. 65–76. External Links: ISSN 14330369 Cited by: §I, §IIA3.
 [2] (196703) Adaptive data compression. Proceedings of the IEEE 55 (3), pp. 267–277. External Links: Document Cited by: §III.
 [3] (2019) Evaluating image quality measures to assess the impact of lossy data compression applied to climate simulation data. Computer Graphics Forum 38 (3), pp. 517–528. External Links: ISSN 01677055, 14678659, Link, Document Cited by: §IIA3, §VII.
 [4] (2017) Toward a multimethod approach: lossy data compression for climate simulation data. In High Performance Computing, Cham, pp. 30–42. External Links: ISBN 9783319676302 Cited by: §IIA1.
 [5] Note: https://www.lcrc.anl.gov/systems/resources/bebopOnline Cited by: §I, §VIA1, §VIB3, §VI.
 [6] (2019) Exploring the feasibility of lossy compression for pde simulations. The International Journal of High Performance Computing Applications 33 (2), pp. 397–410. External Links: Document Cited by: §IIA3.
 [7] (2019) Use cases of lossy compression for floatingpoint data in scientific data sets. The International Journal of High Performance Computing Applications 33 (6), pp. 1201–1220. Cited by: §IIB.
 [8] (2016) Fast errorbounded lossy HPC data compression with SZ. In IEEE International Parallel and Distributed Processing Symposium (IEEE IPDPS), pp. 730–739. Cited by: §I.
 [9] (201902) Error analysis of zfp compression for floatingpoint data. SIAM Journal on Scientific Computing, pp. . Cited by: §IIA3.
 [10] (1992) A special Newtontype optimization method. Optimization 24 (34), pp. 269–284. Cited by: §VB1.
 [11] (2017) Computing just what you need: online data analysis and reduction at extreme scales. In European Conference on Parallel Processing, pp. 3–19. Cited by: §I.
 [12] (1977) OPTLIB: an optimization program library. Mechanical Engineering Modern Design Series 4. Cited by: §VB1.
 [13] Note: https://www.gzip.org/Online Cited by: §I, 4th item.
 [14] (2016) HACC: extreme scaling and performance across diverse architectures. Communications of the ACM 60 (1), pp. 97–104. Cited by: §I.
 [15] (2003) Outofcore compression and decompression of large ndimensional scalar fields. In Computer Graphics Forum, Vol. 22, pp. 343–348. Cited by: 1st item.
 [16] (2015) The community earth system model (CESM), large ensemble project: a community resource for studying climate change in the presence of internal climate variability. Bulletin of the American Meteorological Society 96 (8), pp. 1333–1349. Cited by: §I.
 [17] (201612) FixedRatio Compression of an RGBW Image and Its Hardware Implementation. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 6 (4), pp. 484–496. Cited by: §III.
 [18] (20180306)(Website) External Links: Link Cited by: §I, §VB1.
 [19] (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §VB1.
 [20] (201812) Optimizing lossy compression with adjacent snapshots for nbody simulation data. In 2018 IEEE International Conference on Big Data (Big Data), pp. 428–437. External Links: Document, ISSN Cited by: §IIA1.
 [21] (2018Sept.) An efficient transformation scheme for lossy data compression with pointwise relative error bound. In 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp. 179–189. External Links: Document Cited by: §I.
 [22] (2018) Errorcontrolled lossy compression optimized for high compression ratios of scientific datasets. 2018 IEEE International Conference on Big Data (Big Data), pp. 438–447. Cited by: §I, §VIB4.
 [23] (20190920T13:53:37Z) Libpressio. Codesign Center for Online Data Analysis and Reduction. External Links: Link Cited by: §VA, §VB2.
 [24] (2006) Fast and efficient compression of floatingpoint data. IEEE Transactions on Visualization and Computer Graphics 12 (5), pp. 1245–1250. Cited by: §I.
 [25] (2014) Fixedrate compressed floatingpoint arrays. IEEE transactions on visualization and computer graphics 20 (12), pp. 2674–2683. Cited by: §I, §IIA2, §III.
 [26] (2017) Error distributions of lossy floatingpoint compressors. In Joint Statistical Meetings, pp. 2574–2589. Cited by: §IIA1, §IIA3.
 [27] (20170307) Global optimization of Lipschitz functions. External Links: 1703.02628, Link Cited by: §VB1.
 [28] Note: https://amrexastro.github.io/NyxOnline Cited by: §I.
 [29] (2006) The NEWUOA software for unconstrained optimization without derivatives. In LargeScale Nonlinear Optimization, G. Di Pillo and M. Roma (Eds.), Vol. 83, pp. 255–297. External Links: ISBN 9780387300634 9780387300658, Link, Document Cited by: §VB1.
 [30] (2009) The BOBYQA algorithm for bound constrained optimization without derivatives. Cambridge NA Report NA2009/06, University of Cambridge, Cambridge, pp. 26–46. Cited by: §VB1.
 [31] (1987) Algebraic multigrid. In Multigrid methods, Frontiers in Applied Mathematics, Vol. 3, pp. 73–130. External Links: MathReview Entry Cited by: §IIA3.

[32]
(2002)
COINOR: an opensource library for optimization
. In Programming languages and systems in computational economics and finance, pp. 3–32. Cited by: §VB1.  [33] (2015) Exploration of lossy compression for applicationlevel checkpoint/restart. In Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, IPDPS ’15, Washington, DC, USA, pp. 914–922. External Links: ISBN 9781479986491, Link, Document Cited by: §IIA3.
 [34] Note: https://sdrbench.github.io/Online Cited by: §I, §VIA2, §VIB.
 [35] (2019) Note: http://www.spring8.or.jp/en/Online Cited by: §IIB.
 [36] (2017) Significantly improving lossy compression for scientific data sets based on multidimensional prediction and errorcontrolled quantization. In IEEE International Parallel and Distributed Processing Symposium (IEEE IPDPS), pp. 1129–1139. Cited by: §I.
 [37] (201908) Optimizing Lossy Compression RateDistortion from Automatic Online Selection between SZ and ZFP. IEEE Transactions on Parallel and Distributed Systems 30 (8), pp. 1857–1871. External Links: ISSN 10459219, Document Cited by: §I.
 [38] (2019) Analyzing the impact of lossy compressor variability on checkpointing scientific simulations. In In the Proceedings of the 2019 IEEE International Conference on Cluster Computing, Cluster’19, Washington, DC, USA. Cited by: §IIA3.
 [39] (2001) Multigrid. Academic Press, Inc., Orlando, FL. External Links: ISBN 012701070X Cited by: §IIA3.
 [40] (2019) Full state quantum circuit simulation by using data compression. In IEEE/ACM 30th The International Conference for High Performance computing, Networking, Storage and Analysis (IEEE/ACM SC2019), pp. 1–12. Cited by: §IIA1.
 [41] (200404) Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13 (4), pp. 600–612. External Links: Document, ISSN 19410042 Cited by: §VIB4.
 [42] Note: https://github.com/facebook/zstd/releasesOnline Cited by: §I, 4th item.