Local Density Estimation in High Dimensions

09/20/2018
by   Xian Wu, et al.
0

An important question that arises in the study of high dimensional vector representations learned from data is: given a set D of vectors and a query q, estimate the number of points within a specified distance threshold of q. We develop two estimators, LSH Count and Multi-Probe Count that use locality sensitive hashing to preprocess the data to accurately and efficiently estimate the answers to such questions via importance sampling. A key innovation is the ability to maintain a small number of hash tables via preprocessing data structures and algorithms that sample from multiple buckets in each hash table. We give bounds on the space requirements and sample complexity of our schemes, and demonstrate their effectiveness in experiments on a standard word embedding dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/10/2021

MP-RW-LSH: An Efficient Multi-Probe LSH Solution to ANNS in L_1 Distance

Approximate Nearest Neighbor Search (ANNS) is a fundamental algorithmic ...
research
08/30/2018

Hashing-Based-Estimators for Kernel Density in High Dimensions

Given a set of points P⊂R^d and a kernel k, the Kernel Density Estimate ...
research
07/16/2022

DB-LSH: Locality-Sensitive Hashing with Query-based Dynamic Bucketing

Among many solutions to the high-dimensional approximate nearest neighbo...
research
07/19/2018

Multi-Resolution Hashing for Fast Pairwise Summations

A basic computational primitive in the analysis of massive datasets is s...
research
09/12/2017

Hash Embeddings for Efficient Word Representations

We present hash embeddings, an efficient method for representing words i...
research
04/15/2020

Locality Sensitive Hashing for Set-Queries, Motivated by Group Recommendations

Locality Sensitive Hashing (LSH) is an effective method to index a set o...

Please sign up or login with your details

Forgot password? Click here to reset