Efficient Immediate-Access Dynamic Indexing

11/11/2022
by   Alistair Moffat, et al.
0

In a dynamic retrieval system, documents must be ingested as they arrive, and be immediately findable by queries. Our purpose in this paper is to describe an index structure and processing regime that accommodates that requirement for immediate access, seeking to make the ingestion process as streamlined as possible, while at the same time seeking to make the growing index as small as possible, and seeking to make term-based querying via the index as efficient as possible. We describe a new compression operation and a novel approach to extensible lists which together facilitate that triple goal. In particular, the structure we describe provides incremental document-level indexing using as little as two bytes per posting and only a small amount more for word-level indexing; provides fast document insertion; supports immediate and continuous queryability; provides support for fast conjunctive queries and similarity score-based ranked queries; and facilitates fast conversion of the dynamic index to a "normal" static compressed inverted index structure. Measurement of our new mechanism confirms that in-memory dynamic document-level indexes for collections into the gigabyte range can be constructed at a rate of two gigabytes/minute using a typical server architecture, that multi-term conjunctive Boolean queries can be resolved in just a few milliseconds each on average even while new documents are being concurrently ingested, and that the net memory space required for all of the required data structures amounts to an average of as little as two bytes per stored posting, less than half the space required by the best previous mechanism.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2018

The Potential of Learned Index Structures for Index Compression

Inverted indexes are vital in providing fast key-word-based search. For ...
research
08/07/2023

Collapsing the Hierarchy of Compressed Data Structures: Suffix Arrays in Optimal Compressed Space

In the last decades, the necessity to process massive amounts of textual...
research
02/20/2019

Fast, Small, and Simple Document Listing on Repetitive Text Collections

Document listing on string collections is the task of finding all docume...
research
09/12/2022

Robust and Scalable Content-and-Structure Indexing (Extended Version)

Frequent queries on semi-structured hierarchical data are Content-and-St...
research
06/09/2020

Dynamic Interleaving of Content and Structure for Robust Indexing of Semi-Structured Hierarchical Data (Extended Version)

We propose a robust index for semi-structured hierarchical data that sup...
research
03/18/2020

PolyFit: Polynomial-based Indexing Approach for Fast Approximate Range Aggregate Queries

Range aggregate queries find frequent application in data analytics. In ...
research
09/08/2017

FAST: Frequency-Aware Spatio-Textual Indexing for In-Memory Continuous Filter Query Processing

Many applications need to process massive streams of spatio-textual data...

Please sign up or login with your details

Forgot password? Click here to reset