LeCo: Lightweight Compression via Learning Serial Correlations

06/27/2023
by   Yihao Liu, et al.
0

Lightweight data compression is a key technique that allows column stores to exhibit superior performance for analytical queries. Despite a comprehensive study on dictionary-based encodings to approach Shannon's entropy, few prior works have systematically exploited the serial correlation in a column for compression. In this paper, we propose LeCo (i.e., Learned Compression), a framework that uses machine learning to remove the serial redundancy in a value sequence automatically to achieve an outstanding compression ratio and decompression performance simultaneously. LeCo presents a general approach to this end, making existing (ad-hoc) algorithms such as Frame-of-Reference (FOR), Delta Encoding, and Run-Length Encoding (RLE) special cases under our framework. Our microbenchmark with three synthetic and six real-world data sets shows that a prototype of LeCo achieves a Pareto improvement on both compression ratio and random access speed over the existing solutions. When integrating LeCo into widely-used applications, we observe up to 3.9x speed up in filter-scanning a Parquet file and a 16

READ FULL TEXT

page 9

page 12

research
06/10/2022

PILC: Practical Image Lossless Compression with an End-to-end GPU Oriented Neural Framework

Generative model based image lossless compression algorithms have seen a...
research
05/18/2021

LEA: A Learned Encoding Advisor for Column Stores

Data warehouses organize data in a columnar format to enable faster scan...
research
07/29/2021

A New Lossless Data Compression Algorithm Exploiting Positional Redundancy

A new run length encoding algorithm for lossless data compression that e...
research
01/01/2022

X3: Lossless Data Compressor

X3 is a lossless optimizing dictionary-based data compressor. The algori...
research
05/19/2021

Revisiting Data Compression in Column-Stores

Data compression is widely used in contemporary column-oriented DBMSes t...
research
12/22/2015

A Novel Approach to Compress Centralized Text Data using Indexed Dictionary

Data compression is very important feature in terms of saving the memory...
research
04/05/2018

On Undetected Redundancy in the Burrows-Wheeler Transform

The Burrows-Wheeler-Transform (BWT) is an invertible permutation of a te...

Please sign up or login with your details

Forgot password? Click here to reset