Airphant: Cloud-oriented Document Indexing

12/26/2021
by   Supawit Chockchowwat, et al.
0

Modern data warehouses can scale compute nodes independently of storage. These systems persist their data on cloud storage, which is always available and cost-efficient. Ad-hoc compute nodes then fetch necessary data on-demand from cloud storage. This ability to quickly scale or shrink data systems is highly beneficial if query workloads may change over time. We apply this new architecture to search engines with a focus on optimizing their latencies in cloud environments. However, simply placing existing search engines (e.g., Apache Lucene) on top of cloud storage significantly increases their end-to-end query latencies (i.e., more than 6 seconds on average in one of our studies). This is because their indexes can incur multiple network round-trips due to their hierarchical structure (e.g., skip lists, B-trees, learned indexes). To address this issue, we develop a new statistical index (called IoU Sketch). For lookup, IoU Sketch makes multiple asynchronous network requests in parallel. While IoU Sketch may fetch more bytes than existing indexes, it significantly reduces the index lookup time because parallel requests do not block each other. Based on IoU Sketch, we build an end-to-end search engine, called Airphant; we describe how Airphant builds, optimizes, and manages IoU Sketch; and ultimately, supports keyword-based querying. In our experiments with four real datasets, Airphant's average end-to-end latencies are between 13 milliseconds and 300 milliseconds, being up to 8.97x faster than Apache Lucence and 113.39x faster than Elasticsearch.

READ FULL TEXT
research
08/07/2022

Automatically Finding Optimal Index Structure

Existing learned indexes (e.g., RMI, ALEX, PGM) optimize the internal re...
research
06/26/2023

AirIndex: Versatile Index Tuning Through Data and Storage

The end-to-end lookup latency of a hierarchical index – such as a B-tree...
research
08/28/2019

Techniques for Inverted Index Compression

The data structure at the core of large-scale search engines is the inve...
research
12/29/2021

Literature Review of the Pioneering Approaches in Cloud-based Search Engines Powered by LETOR Techniques

Search engines play an essential role in our daily lives. Nonetheless, t...
research
03/03/2021

Integrating Column-Oriented Storage and Query Processing Techniques Into Graph Database Management Systems

We revisit column-oriented storage and query processing techniques in th...
research
06/07/2023

An Analytical Model-based Capacity Planning Approach for Building CSD-based Storage Systems

The data movement in large-scale computing facilities (from compute node...
research
12/04/2019

Privacy-Preserving Search for a Similar Genomic Makeup in the Cloud

In this paper, we attempt to provide a privacy-preserving and efficient ...

Please sign up or login with your details

Forgot password? Click here to reset