Memory-Efficient RkNN Retrieval by Nonlinear k-Distance Approximation

11/03/2020
by   Sandra Obermeier, et al.
0

The reverse k-nearest neighbor (RkNN) query is an established query type with various applications reaching from identifying highly influential objects over incrementally updating kNN graphs to optimizing sensor communication and outlier detection. State-of-the-art solutions exploit that the k-distances in real-world datasets often follow the power-law distribution, and bound them with linear lines in log-log space. In this work, we investigate this assumption and uncover that it is violated in regions of changing density, which we show are typical for real-life datasets. Towards a generic solution, we pose the estimation of k-distances as a regression problem. Thereby, we enable harnessing the power of the abundance of available Machine Learning models and profiting from their advancement. We propose a flexible approach which allows steering the performance-memory consumption trade-off, and in particular to find good solutions with a fixed memory budget crucial in the context of edge computing. Moreover, we show how to obtain and improve guaranteed bounds essential to exact query processing. In experiments on real-world datasets, we demonstrate how this framework can significantly reduce the index memory consumption, and strongly reduce the candidate set size. We publish our code at https://github.com/sobermeier/nonlinear-kdist.

READ FULL TEXT
research
07/09/2018

Learning to Index for Nearest Neighbor Search

In this study, we present a novel ranking model based on learning the ne...
research
12/11/2021

SLOSH: Set LOcality Sensitive Hashing via Sliced-Wasserstein Embeddings

Learning from set-structured data is an essential problem with many appl...
research
04/21/2022

A Learned Index for Exact Similarity Search in Metric Spaces

Indexing is an effective way to support efficient query processing in la...
research
05/26/2020

Memory-Efficient Sampling for Minimax Distance Measures

Minimax distance measure extracts the underlying patterns and manifolds ...
research
02/10/2018

Learning Correlation Space for Time Series

We propose an approximation algorithm for efficient correlation search i...
research
03/25/2022

Navigable Proximity Graph-Driven Native Hybrid Queries with Structured and Unstructured Constraints

As research interest surges, vector similarity search is applied in mult...
research
06/10/2021

Jointly Optimize Coding and Node Selection for Distributed Computing over Wireless Edge Networks

This work aims to jointly optimize the coding and node selection to mini...

Please sign up or login with your details

Forgot password? Click here to reset