Compressing integer lists with Contextual Arithmetic Trits

09/05/2022
by   Yann Barsamian, et al.
0

Inverted indexes allow to query large databases without needing to search in the database at each query. An important line of research is to construct the most efficient inverted indexes, both in terms of compression ratio and time efficiency. In this article, we show how to use trit encoding, combined with contextual methods for computing inverted indexes. We perform an extensive study of different variants of these methods and show that our method consistently outperforms the Binary Interpolative Method – which is one of the golden standards in this topic – with respect to compression size. We apply our methods to a variety of datasets and make available the source code that produced the results, together with all our datasets.

READ FULL TEXT

page 7

page 13

page 21

research
06/08/2017

Source Forager: A Search Engine for Similar Source Code

Developers spend a significant amount of time searching for code: e.g., ...
research
08/15/2019

Semantic Source Code Search: A Study of the Past and a Glimpse at the Future

With the recent explosion in the size and complexity of source codebases...
research
08/28/2019

Techniques for Inverted Index Compression

The data structure at the core of large-scale search engines is the inve...
research
04/29/2018

Variable-Byte Encoding is Now Space-Efficient Too

The ubiquitous Variable-Byte encoding is considered one of the fastest c...
research
06/17/2022

Fast Lossless Neural Compression with Integer-Only Discrete Flows

By applying entropy codecs with learned data distributions, neural compr...
research
11/25/2013

Learning Reputation in an Authorship Network

The problem of searching for experts in a given academic field is hugely...
research
10/03/2021

Binary code optimization

This article shows that any type of binary data can be defined as a coll...

Please sign up or login with your details

Forgot password? Click here to reset