qwLSH: Cache-conscious Indexing for Processing Similarity Search Query Workloads in High-Dimensional Spaces

07/26/2019
by   Omid Jafari, et al.
0

Similarity search queries in high-dimensional spaces are an important type of queries in many domains such as image processing, machine learning, etc. Since exact similarity search indexing techniques suffer from the well-known curse of dimensionality in high-dimensional spaces, approximate search techniques are often utilized instead. Locality Sensitive Hashing (LSH) has been shown to be an effective approximate search method for solving similarity search queries in high-dimensional spaces. Often times, queries in real-world settings arrive as part of a query workload. LSH and its variants are particularly designed to solve single queries effectively. They suffer from one major drawback while executing query workloads: they do not take into consideration important data characteristics for effective cache utilization while designing the index structures. In this paper, we present qwLSH, an index structure for efficiently processing similarity search query workloads in high-dimensional spaces. We intelligently divide a given cache during processing of a query workload by using novel cost models. Experimental results show that, given a query workload, qwLSH is able to perform faster than existing techniques due to its unique cost models and strategies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/15/2019

Efficient Bitmap-based Indexing and Retrieval of Similarity Search Image Queries

Finding similar images is a necessary operation in many multimedia appli...
research
08/04/2022

Unconventional application of k-means for distributed approximate similarity search

Similarity search based on a distance function in metric spaces is a fun...
research
04/21/2022

A Learned Index for Exact Similarity Search in Metric Spaces

Indexing is an effective way to support efficient query processing in la...
research
04/04/2023

High-Throughput Vector Similarity Search in Knowledge Graphs

There is an increasing adoption of machine learning for encoding data in...
research
12/10/2014

Memory vectors for similarity search in high-dimensional spaces

We study an indexing architecture to store and search in a database of h...
research
12/18/2018

Index-based, High-dimensional, Cosine Threshold Querying with Optimality Guarantees

Given a database of vectors, a cosine threshold query returns all vector...
research
05/15/2020

Near-duplicate video detection featuring coupled temporal and perceptual visual structures and logical inference based matching

We propose in this paper an architecture for near-duplicate video detect...

Please sign up or login with your details

Forgot password? Click here to reset