Variable-Byte Encoding is Now Space-Efficient Too

04/29/2018
by   Giulio Ermanno Pibiri, et al.
0

The ubiquitous Variable-Byte encoding is considered one of the fastest compressed representation for integer sequences. However, its compression ratio is usually not competitive with other more sophisticated encoders, especially when the integers to be compressed are small that is the typical case for inverted indexes. This paper shows that the compression ratio of Variable-Byte can be improved by 2× by adopting a partitioned representation of the inverted lists. This makes Variable-Byte surprisingly competitive in space with the best bit-aligned encoders, hence disproving the folklore belief that Variable-Byte is space-inefficient for inverted index compression. Despite the significant space savings, we show that our optimization almost comes for free, given that: we introduce an optimal partitioning algorithm that, by running in linear time and with low constant factors, does not affect indexing time; we show that the query processing speed of Variable-Byte is preserved, with an extensive experimental analysis and comparison with several other state-of-the-art encoders.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/28/2014

V-variable image compression

V-variable fractals, where V is a positive integer, are intuitively frac...
research
02/12/2023

Efficient Integer Retrieving from Unordered Compressed Sequences

The variable-length Reverse Multi-Delimiter (RMD) codes are known to rep...
research
07/01/2019

On Slicing Sorted Integer Sequences

Representing sorted integer sequences in small space is a central proble...
research
09/02/2020

Zuckerli: A New Compressed Representation for Graphs

Zuckerli is a scalable compression system meant for large real-world gra...
research
09/05/2022

Compressing integer lists with Contextual Arithmetic Trits

Inverted indexes allow to query large databases without needing to searc...
research
04/16/2019

Compressed Indexes for Fast Search of Semantic Data

The sheer increase in volume of RDF data demands efficient solutions for...
research
12/02/2022

Trie-Compressed Intersectable Sets

We introduce space- and time-efficient algorithms and data structures fo...

Please sign up or login with your details

Forgot password? Click here to reset