Today’s HPC simulations and advanced instruments produce vast volumes of scientific data, which may cause many serious issues including a huge storage burden [cesm-le-data, baker2014methodology, sz18, li2018optimizing], I/O bottlenecks compared with fast stream processing [cappello2019use], and insufficient memory issues [wu2018memory]. For example, the Hardware/Hybrid Accelerated Cosmology Code (HACC) [hacc] (twice a finalist nomination for ACM’s Gordon Bell Prize) can produce 20 petabytes of data to store when simulating up to 3.5 trillions of particles with 300 timesteps. Even considering a sustained bandwidth of 1 TB/s, the I/O time will still exceed 5 hours, which is prohibitive. Thus, the researchers generally output the data by decimation, that is, storing one snapshot every several timesteps in the simulation. This process definitely degrades the temporal constructiveness of the simulation and also loses valuable information for postanalysis.
Another typical example is instrument data generated for materials science research. The advanced instruments (such as the Advanced Photon Source at Argonne) may produce the data with a super-high rate such as 500 GB/s (will increase by at least two orders of magnitude with the coming upgrades [APSU]), so that thousands of discs are required to sustain the high data production rate if without compression support.
To mitigate the significant storage burden and I/O bottleneck, researchers have used many data compressors. Lossless compressors such as Gzip [gzip], Zstd [zstd], Blosc [blosc], and FPC [fpc] suffer from low compression ratios (around 2:1 [son2014data]) in reducing scientific data size because of the high randomness of ending mantissa bits in the floating-point representations [lindstromerror]. Accordingly, error-bounded lossy compression has been treated as one of the best approaches to solve this big scientific data issue [sz16, li2018optimizing].
Although existing error-bounded lossy compressors such as SZ [sz16, sz17, sz18] and ZFP [zfp] can strictly control the compression error of each data point, a significant gap still remains in understanding the impact of compression errors on program output. In other words, the propagation of compression errors in HPC programs has not been well studied and understood. Therefore, current lossy compression methods may lead to unacceptably inaccurate results for scientific discovery [sasaki2015exploration, calhoun2019exploring, reza2019analyzing] based on the corrupted program output.
Fault Injection (FI) is a widely used technique to evaluate the resilience of software applications to faults. While FI has been extensively used in general purpose applications, to the best of our knowledge, there does not exist a FI tool for lossy compression errors. The main challenges in developing such a fault injector remain in (1) designing a proper abstraction of compression fault model, and (2) integrating the fault model at the level where one can also conduct program-level error propagation analysis. Our contributions are listed as follows.
We propose a systematic approach for efficient lossy compression fault injection to help compressor developers and users to understand the impact of compression error on their interest HPC applications.
We build a fault injector (called LCFI) to inject lossy compression errors into any given HPC programs. The tool is highly applicable, customizable, ease-to-use, and able to generate top-down comprehensive results. We also demonstrate the use of LCFI using a simple example program.
We evaluate LCFI on four representative HPC benchmark programs with different lossy compression errors to understand how different compressors affect those programs’ outputs. Experimental results provide several important insights for users to understand how to strategically use lossy compression in order to avoid corrupting program output.
The rest of the paper is organized as follows. In Section II, we discuss the background and our research motivation. In Section III, we discuss our fault model for lossy compression error. In Section IV, we present the design and implementation details of our FI tool LCFI. In Section V, we describe the use of LCFI in detail. In Section VI, we present our evaluation results. In Section VII, we conclude and discuss future work.
Ii Background and Motivation
Ii-a Error-bounded Lossy Compression for HPC Data
Data compression has been studied for decades. There are two main categories: lossless compression and lossy compression. Lossless compressors such as FPZIP [lindstrom2006fast] and FPC [fpc] can only provide limited compression ratios (typically up to 2:1 for most scientific data) due to the significant randomness of the ending mantissa bits [son2014data].
Lossy compression, on the other hand, can compress data with little information loss in the reconstructed data. Compared to lossless compression, lossy compression can provide a much higher compression ratio while still maintaining useful information for scientific discoveries. Different lossy compressors can provide different compression modes, such as error-bounded mode and fixed-rate mode. Error-bounded mode requires users to set an error bound, such as absolute error bound and point-wise relative error bound. The compressor ensures the differences between the original data and the reconstructed data do not exceed the user-set error bound. Fixed-rate mode means that users can set a target bitrate, and the compressor guarantees the actual bitrate of the compressed data to be lower than the user-set value. In this work, we mostly focus on the error-bound mode and leave the fixed-rate mode for the future work.
In recent years, a new generation of lossy compressors for HPC data have been proposed and developed, such as SZ [sz16, sz17, sz18] and ZFP [zfp]. Unlike traditional lossy compressors such as JPEG [wallace1992jpeg] which is designed for images (in integers), SZ and ZFP are designed to compress floating-point and integer HPC data and can provide a strict error-controlling scheme based on user’s requirements. SZ is a representative prediction-based error-bounded lossy compressor. SZ has three main steps: (1) predicts each data point’s value based on its neighboring points by using an adaptive, best-fit prediction method; (2) quantizes the difference between the real value and predicted value based on the user-set error bound; and (3) applies a customized Huffman coding and lossless compression to achieve a higher compression ratio. ZFP is a representative transform-based error-bounded lossy compressor for floating-point and integer data. ZFP splits the whole data set into many small blocks with an edge size of 4 along each dimension and compresses the data in each block separately in four main steps: (1) alignment of exponent, (2) orthogonal transform, (3) fixed-point integer conversion, and (4) bit-plane-based embedded coding. For more details, we refer readers to [sz17] and [zfp] for SZ and ZFP, respectively.
LLFI[LLFI-QRS] is an LLVM based FI tool that injects faults into the LLVM IR of the application source code. There are three core parts in LLFI: Instrument, Profile, and Injection as shown in Figure 1.
In general, the instrument part takes an IR file as input and generates IR files with instrumented profiling and fault injection function calls. The profile part takes a profiling executable, executes it, and generates the baseline results. Using these results, users can determine whether the fault has influenced the execution of the program. The injection part will inject a fault set in the input.yaml to the program. After this step, the final results are generated including program output file, trace file and fault-injection file.
Ii-C Research Motivation
Existing lossy compressors mainly focus on optimizing from three aspects: compression ratio (i.e., storage reduction ratio), and compression speed (a.k.a., throughput), and reconstructed data quality based on statistical metrics such as PSNR (peak signal-to-noise ratio) and SSIM (structural similarity index measure). However, only few works[tao2018improving, reza2019analyzing, evans2020jpeg] have studied the impact of compression error on HPC applications and none of them have systematically studied how compression errors propagate in any HPC program. This is because unlike traditional resilience and fault tolerance community that has many fault injection tools (such as PinFI [PINFI], LLFI [LLFI-QRS], an TensorFI [li2018tensorfi]) to investigate how software applications are resilient to hardware errors, the HPC community is missing an efficient fault injection tool for lossy compression errors, which can help lossy compressor developers and users to understand the compression error impact on specific HPC programs. This motivates us to develop such tool in this work.
Iii Lossy Compression Fault Model
Unlike lossless compression, lossy compression cannot precisely recover numerical data bit by bit. However, lossy compressed data are acceptable in many use cases (such as storage reduction, in situ visualization, and checkpoint/restart [cappello2019use]) for HPC applications. The is because HPC/scientific data itself tends to involve many error terms. Taking experimental and observational data as an example, finite precision measurements and intrinsic measurement noise make an impact on the data accuracy. On the other hand, round-off, truncation, iteration and model errors that appear in numerical simulations also have limited precision/accuracy. Thus, using lossy compression techniques to approximate/compress floating-point data is acceptable and one of the best solutions for solving the big scientific data issue [calhoun2019exploring, poppick2020statistical, li2018data].
We propose to simulate compression errors instead of performing actual compression and decompression for FI, because current state-of-the-art lossy compressors such as SZ and ZFP can only provide the throughputs of hundreds of megabytes per second. Taking into account the following two reasons, the approach of actual compression and decompression would introduce very high runtime overheads: (1) existing lossy compressors have a large design space including compression algorithms (e.g., SZ [sz16, sz17, sz18], ZFP [zfp], FPZIP [fpzip], MGARD [mgard], TTHRESH [tthresh], VAPOR [vapor]) and their diverse configurations (e.g., error-controlling mode, error bound); and (2) to obtain a reasonable coverage for diverse HPC programs, a large amount of FI locations need to be considered. As a result, the approach of actual compression and decompression for FI will be very inefficient; instead, we choose to simulate the compression errors in our FI tool.
To simulate the compression errors, we have to understand the fault model for specific compression algorithm. For example, Figure 2 illustrates an example error distribution when compressing and decompressing an activation data by SZ. The activation data used here for demonstration is extracted from Conv-5 layer of AlexNet [krizhevsky2012imagenet] in a certain iteration. Lindstrom [Lindstrom2017Error] studied errors distributions of lossy floating-point compressors in a statistical way. They concluded that lossy compression error distributions depend on their adopted quantization techniques. Specifically, lossy compressors adopted uniform scalar quantization such as SZ [sz16, sz17, sz18], SQ [iverson2012fast], and LZ4A [kunkel2017toward]
tend to generate uniformly distributed errors, while transform-based lossy compressors such as ZFP[zfp], VAPOR [vapor], and TTHRESH [tthresh]
produce error distributions that close to normal (a.k.a., Gaussian). Inspired by this work, we mainly focus on these two error distributions (i.e., uniform and normal distributions) in our case studies; however, it is worth noting thatLCFI is extensible with any given error distribution (will be described later).
Iv Design and Implementation
LCFI111LCFI is publicly available at https://github.com/LCFI/LCFI. is an extension of LLFI[LLFI-QRS]. In this section, we first discuss our design goals and assumptions for LCFI. We then present the improvements and features of LCFI. Finally, we present some of the implementation details.
Iv-a Design Goals and Assumptions
We have four goals in the design of LCFI as follow:
Applicability: We aim to create a tool that is simple and easy to use, that users can exploit even without knowing a lot about error-bounded lossy compression. With a simple program written in C/C++, users should be able to easily inject a fault to a a specific variable at a specific location. For example, if the target variable is located in a for-loop, the user can inject faults in a specific iteration of this for-loop, which is meaningful for changing an array’s value.
Customizable: Given that there are a large number of error distributions in lossy compression (considering future newly designed compressors), it is not feasible to provide a tailored tool for all distributions. We provide a template of the injection to allow users to customize their own errors for their particular distribution.
Ease-to-Use: We aim to provide users a simple installation process that does not require editing several setup files. To install LCFI, users only need to edit just one or two YAML-files and run a few commands (e.g., no more than four) to get the results of the injector. LCFI usage should also not require understanding of how the compiler works or the ability to read IR files.
Top-Down Comprehensive Result: We aim to make the injector provide both high-level and underlying results (such as registers’ value). Users can choose to revise the output file or trace the error propagation to potentially find Benign Faults [li2018modeling] (will be discussed in Section VI).
At the same time, we also make following assumptions about the faults injected by LCFI:
Faults can only be injected to variables which are on the right of the equal sign due to the nature of LLVM. Changing a variable on the left of assignment can be achieved by changing all variables on the right of assignment.
Faults cannot be injected on variables located in the main function. This is because most of the faults in the main functions will cause the program to crash which will make injection meaningless.
Because LLFI does not support OpenMP, one can only run LCFI on serial programs without multiple threads. In the future, with LLFI-GPU developed, we will further design an OpenMP and CUDA version of LCFI.
Iv-B Design of LCFI
Unlike LLFI which focuses on the impact caused by different software faults and hardware faults, LCFI focuses on how different lossy compression error distributions impact on the running of different programs. Thus, to build LCFI, we modified the way LLFI injects faults and faults themselves. The core design of LCFI is shown in Figure 3.
We propose the following designs in LCFI to satisfy the previously described goals. More details will be showed later for the use description in Section III.
Applicability: We provide several YAML files which users can edit. In these YAML files, users can easily select the variable where they want to inject the fault and select what kind of the fault they want to inject. Users are not required to understand how lossy compression works but can still get results directly.
Customizable: Unlike LLFI’s complex step of customizing faults, we provide a template for the distribution of lossy compression errors. To custom faults or distributions, users just need to simply edit this template and recompile the project.
Ease-to-Use: Using the Python scripts we provide, LCFI can automatically find the location of specific variables in the IR file. Users can use the Python scripts to tell the injector what index it should target. Thus, users do not need to understand a complicated IR file to use LCFI.
Top-Down Comprehensive Result: LCFI generates both high-level and underlying results such as standard output files and IR-level results. Users can use both information to perform program-level error propagation analysis.
Iv-C LCFI Features
LCFI improves the functionality of LLFI by introducing the following new features:
Multi-Locations Unlike LLFI that can only inject a fault to only one specific location, LCFI allows injecting a fault at any given location and at any given time.
For-loop Injection For HPC applications, for-loops are the most frequently used loops. For LCFI, we designed an interface for users to set the loop number so that users can inject faults at specific iterations during the for-loop execution. This is imperative if the user wants to a inject the fault to an array.
Custom Distribution We optimize the current LLFI interface to allow users to easily create and customize their own lossy compression errors.
Iv-D Implementation Details
Similar to LLFI, LCFI is implemented using the LLVM-Pass (in C/C++) and Python. We split the LLVM-Pass code and Python code into three modules as follows:
LLVM-Pass Core is the main module which controls the underlying execution of the target program. It also provides the functionality to trace the execution and insert run-time code.
Run-time Lib module consists of different fault implementations and determines which variables need to be injected.
Tools module consist of some useful tools for users to analyse the results from LCFI. It includes Trace_To_Dot, Trace_Union and Trace_Diff.
LCFI results consist of four main parts as follows:
Baseline: This part comes from the origin program, which includes golden_std_output, llfi.stat.trace.prof.txt and an output file. golden_std_output is the standard output of the origin program. The llfi.stat.trace.prof.txt records the value changes of every register.
Program Output File: This part comes from execution of injected program. If the program does not generate an output file, this part will be empty.
Error Output: If injected program crashes, the log file will be stored here.
Standard Output: This part comes from execution of injected program.
LLFI Stat Output: This part records the value change of every register. If faults have successfully been injected into the program, the injection log will be also stored.
V Usage Model
In this section, we will demonstrate how to customize a distribution of lossy compression errors in LCFI and how to inject the fault into a program written in C/C++. The example C code is shown in LABEL:lst:animate.
In the sample code, the main function calls the process function three times. The process function contains a for-loop that will be executed three times. In each for-loop, the program will print the value in the n array.
After compiling and instrumenting the file, we will get two IR files. Firstly, let us focus on the demo-llfi_index.ll. Listing LABEL:lst:code2 shows some details about the process function in ”LLVM IR” format. In this file, every IR instruction is given an index so that the injector can recognize different instructions in next step. As indicated in the configurable YAML file shown in Listing LABEL:lst:yaml1, target to inject fault on the variable in line 9. To do so, we set the variable_name as n, the function_name as process. The variable n appears for the first time in line 9, so we set the variable_location as 1. Because n is in a for-loop and is a array, we set the in_arr and in_loop to true. Particularly, we want to inject faults in the 3rd loop, so we set loop_num as 3. Running the python script setinput.py generates input.yaml.
Let us take a look at demo-llfi_profiling.ll222This file is generated when trace option is set to true., as shown in Listing LABEL:lst:code2. After instrumenting, the process function will be in “LLVM IR” format. In this file there are some similar instructions that are responsible for printing trace information. There are some no added instructions before the origin instructions, because these instructions do not return any registers. These kind of instructions always use the same store opcode, because the store instruction only stores some value in a specific register but does not return any registers. That is why assuming that users cannot change the variable on the left of assignment symbol, as presented in Section IV-A.
The next step is profiling and injection. After running these two instructions, we will get the baseline result and injected results. If turning on the trace switch, we will also get following trace file similar to Listing LABEL:lst:trace1 and LABEL:lst:trace2 in a baseline dictionary. We can use the trace-diff instruction to analyse error propagation in the execution of program. As shown in the listings, the value of 18-index is different because we inject faults on the 15-index, the value of n. In other words, the value of ans has been changed.
Finally, we can get these different output results between the baseline and injected program, as shown in Figure LABEL:lst:results. The values in the third loop are all different from the baseline.
In this section, we use different lossy compression error distributions to inject faults into several representative HPC programs. In these programs, we select some typical variables for analyses where lossy compression is needed. The names of programs and selected variable are shown in Table I. The goal of our experiment is to demonstrate that LCFI has the ability to inject various lossy compression errors with different distribution into different program locations.
|Benchmark||Index||Variable Name||Data Type||In Array?||In For-Loop?||Loop Num.|
|HPCCG [hpccg]||1469||x||Double||True||True||1, 5|
Vi-a Experimental Configuration
Using lossy compression in HPC applications is for data reduction, thus, we select representative variables with relatively large sizes to inject the lossy compression errors in the core function as shown in Table I.
Programs: We use the benchmarks provided by [palazzi2019tale], which are very popular HPC benchmarks.
Index and Variable Name: In the IR format file, a specific llfi-index means a specific variable and its location. Using the index, we can determine the injection location.
In Loop or Array?: The information of this attribute is discussed in Section V.
Fault Type: We use four types of faults which are the combinations of two typical error-bound modes (absolute error and relative error) and two error distributions (uniform distribution and normal distribution).
Vi-B Evaluation Results
HPCCG is a simple conjugate gradient benchmark code for a 3D chimney domain. We test the variable x in the waxpby function. We observe that even injecting lossy compression errors on the same variable, different error distributions or locations may lead to different program outputs.
Table II shows the results when we inject faults in the first loop. We observe that every program with compression errors injected can still converge, but programs injected with absolute errors will converge much slower than those with relative errors.
|Fault Type||Relative + Uniform||Relative + Normal|
|Error Bound||1%, 5%, 10%, 50%, 100%||1%, 5%, 10%, 50%, 100%|
|Converge Iter.||99 (same as baseline)||99 (same as baseline)|
|Fault Type||Absolute + Uniform||Absolute + Normal|
Moreover, when we inject faults on variable x in the fifth loop, none of the programs is able to converge within 150 iterations (i.e., the upper limit of computation time set by the program in default). The results are shown in Figure 4. In order to better illustrate the final residual of the program after finishing 150 iterations, we compute a new metric as:
The Black-Scholes model is a mathematical model for the dynamics of a financial market containing derivative investment instruments. We test the variable xNPrimeofX in the CNDF function. According to the running logs, some of runs are crashed, and others generate corrupted results, none of which are correct. Figure 5 illustrates the percentage of crashed runs and completed (but with corrupted outputs) runs. Due to the paper’s focus (tool development), we will investigate the root cause of these crashes in the future.
XS-Bench is a mini-app representing a key computational kernel of the Monte Carlo neutron transport algorithm. We test the variable a1 in the calculate_macro_xs function. According to the standard output, although all injected programs can finish the execution, every injected program either generates different output or runs lower with the same output, compared to the baseline program. Listing LABEL:lst:res3 illustrates the different verification checksums generated by the baseline and injected programs. We note that the baseline programs cost about 127 seconds, but the injected programs cost about 260 seconds.
NPB-MG is multi-grid (MG) method implemented in The NAS Parallel Benchmarks [npbmg]. In numerical analysis, a MG method is an algorithm for solving differential equations using a hierarchy of discretizations. We test the variable a1 in the vranlc function. We observe that the outputs of all injected programs are corrupted with the tested fault types (including error distributions and error-bound modes).
Vi-C Observation 1: Corrupted or Not or Slow Convergence?
According to Section VI-B, we observe that the programs with faults injected can crash (Black-Scholes), generate incorrect results (HPCCG and NPB-MG), or take longer time to complete or converge (HPCCG and XSBench). In addition, some faults may have no impact on the program execution such as HPCCG. The last situation will be discussed in Section VI-D.
Therefore, we can say that our tool can simulate different faulty scenarios and effectively guide users on how to use lossy compressors. As shown in Section VI-B1, we can find that, as the error bound increases, the becomes smaller, which means the final residual becomes larger; in other words, the program converges more slowly. This means that when users try to use lossy compression here, they have to be careful about the error bound to set. As shown in Section VI-B3, Even if the simulation time becomes about twice as long, the program with injected fault still cannot get the correct output. This means that users cannot use lossy compression for this specific variable in XSbench.
Vi-D Observation 2: Execution Path Affected?
According to Table II, we observe that some injected faults do not have any noticeable impact on the program’s output. We call these faults Benign Faults. Based on the trace file, we find that the fault was injected in the first loop but disappeared in the second loop. The first loop is located in line 5 of Listing LABEL:lst:HPCCG, and the second loop is located in line 7. We get such error propagation figures between benign fault and normal fault333Compared with benign fault, normal faults are the faults that have an impact/corruption on the program’s final output., as shown Figure 6. This demonstrates that users can use LCFI to effectively trace lossy compression error propagation.
Vii Conclusion and Future Work
In this paper, we propose and develop a new fault injector for lossy compression error called LCFI (Lossy Compression Fault Injector). This tool can realize IR-level analysis for lossy compression errors. LCFI can provide useful insights for developers of lossy compression to design a better compression for specific HPC programs. Based on our evaluation results, we find that different programs have different resilience on lossy compression errors. In specific programs, different variables or even the same variable in different locations may have different sensitivities to a given type of lossy compression error. In the future, we plan to extend LCFI with an OpenMP and GPU support, which will have broader prospect and applications.