Improving Run Length Encoding by Preprocessing

01/13/2021
by   Sven Fiergolla, et al.
0

The Run Length Encoding (RLE) compression method is a long standing simple lossless compression scheme which is easy to implement and achieves a good compression on input data which contains repeating consecutive symbols. In its pure form RLE is not applicable on natural text or other input data with short sequences of identical symbols. We present a combination of preprocessing steps that turn arbitrary input data in a byte-wise encoding into a bit-string which is highly suitable for RLE compression. The main idea is to first read all most significant bits of the input byte-string, followed by the second most significant bit, and so on. We combine this approach by a dynamic byte remapping as well as a Burrows-Wheeler-Scott transform on a byte level. Finally, we apply a Huffman Encoding on the output of the bit-wise RLE encoding to allow for more dynamic lengths of code words encoding runs of the RLE. With our technique we can achieve a lossless average compression which is better than the standard RLE compression by a factor of 8 on average.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2019

Dv2v: A Dynamic Variable-to-Variable Compressor

We present Dv2v, a new dynamic (one-pass) variable-to-variable compresso...
research
08/31/2022

Computing all-vs-all MEMs in run-length encoded collections of HiFi reads

We describe an algorithm to find maximal exact matches (MEMs) among HiFi...
research
07/29/2021

A New Lossless Data Compression Algorithm Exploiting Positional Redundancy

A new run length encoding algorithm for lossless data compression that e...
research
05/21/2021

Weighted Burrows-Wheeler Compression

A weight based dynamic compression method has recently been proposed, wh...
research
11/08/2019

DZip: improved general-purpose lossless compression based on novel neural network modeling

We consider lossless compression based on statistical data modeling foll...
research
08/25/2021

Encoding Scheme For Infinite Set of Symbols: The Percolation Process

It is shown here that the percolation process on binary trees that is eq...
research
07/15/2021

Compressing Multisets with Large Alphabets

Current methods that optimally compress multisets are not suitable for h...

Please sign up or login with your details

Forgot password? Click here to reset