Erasing-based lossless compression method for streaming floating-point time series

06/28/2023
by   Ruiyuan Li, et al.
0

There are a prohibitively large number of floating-point time series data generated at an unprecedentedly high rate. An efficient, compact and lossless compression for time series data is of great importance for a wide range of scenarios. Most existing lossless floating-point compression methods are based on the XOR operation, but they do not fully exploit the trailing zeros, which usually results in an unsatisfactory compression ratio. This paper proposes an Erasing-based Lossless Floating-point compression algorithm, i.e., Elf. The main idea of Elf is to erase the last few bits (i.e., set them to zero) of floating-point values, so the XORed values are supposed to contain many trailing zeros. The challenges of the erasing-based method are three-fold. First, how to quickly determine the erased bits? Second, how to losslessly recover the original data from the erased ones? Third, how to compactly encode the erased data? Through rigorous mathematical analysis, Elf can directly determine the erased bits and restore the original values without losing any precision. To further improve the compression ratio, we propose a novel encoding strategy for the XORed values with many trailing zeros. Furthermore, observing the values in a time series usually have similar significand counts, we propose an upgraded version of Elf named Elf+ by optimizing the significand count encoding strategy, which improves the compression ratio and reduces the running time further. Both Elf and Elf+ work in a streaming fashion. They take only O(N) (where N is the length of a time series) in time and O(1) in space, and achieve a notable compression ratio with a theoretical guarantee. Extensive experiments using 22 datasets show the powerful performance of Elf and Elf+ compared with 9 advanced competitors for both double-precision and single-precision floating-point values.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/23/2023

Adaptive Encoding Strategies for Erasing-Based Lossless Floating-Point Compression

Lossless floating-point time series compression is crucial for a wide ra...
research
11/01/2019

LFZip: Lossy compression of multivariate floating-point time series data via improved prediction

Time series data compression is emerging as an important problem with th...
research
02/27/2020

Inline Vector Compression for Computational Physics

A novel inline data compression method is presented for single-precision...
research
11/16/2019

IDEALEM: Statistical Similarity Based Data Reduction

Many applications such as scientific simulation, sensing, and power grid...
research
08/07/2023

Lossless preprocessing of floating point data to enhance compression

Data compression algorithms typically rely on identifying repeated seque...
research
03/08/2023

Change a Bit to save Bytes: Compression for Floating Point Time-Series Data

The number of IoT devices is expected to continue its dramatic growth in...
research
08/09/2023

Sparse Binary Transformers for Multivariate Time Series Modeling

Compressed Neural Networks have the potential to enable deep learning ac...

Please sign up or login with your details

Forgot password? Click here to reset