Multi-Resolution Hashing for Fast Pairwise Summations

07/19/2018
by   Moses Charikar, et al.
0

A basic computational primitive in the analysis of massive datasets is summing simple functions over a large number of objects. Modern applications pose an additional challenge in that such functions often depend on a parameter vector y (query) that is unknown a priori. Given a set of points X⊂R^d and a pairwise function w:R^d×R^d→ [0,1], we study the problem of designing a data-structure that enables sublinear-time approximation of the summation Z_w(y)=1/|X|∑_x∈ Xw(x,y) for any query y∈R^d. By combining ideas from Harmonic Analysis (partitions of unity and approximation theory) with Hashing-Based-Estimators [Charikar, Siminelakis FOCS'17], we provide a general framework for designing such data-structures through hashing. A key design principle is a collection of T≥ 1 hashing schemes with collision probabilities p_1,..., p_T such that _t∈ [T]{p_t(x,y)} = Θ(√(w(x,y))). This leads to a data-structure that approximates Z_w(y) using a sub-linear number of samples from each hash family. Using this new framework along with Distance Sensitive Hashing [Aumuller, Christiani, Pagh, Silvestri PODS'18], we show that such a collection can be constructed and evaluated efficiently for any log-convex function w(x,y)=e^ϕ(〈 x,y〉) of the inner product on the unit sphere x,y∈S^d-1. Our method leads to data-structures with sub-linear query time that significantly improve upon random sampling and can be used for Kernel Density or Partition Function Estimation. We provide extensions of our result from the sphere to R^d, and from scalar functions to vector functions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2018

Hashing-Based-Estimators for Kernel Density in High Dimensions

Given a set of points P⊂R^d and a kernel k, the Kernel Density Estimate ...
research
12/21/2022

Adaptive and Dynamic Multi-Resolution Hashing for Pairwise Summations

In this paper, we propose Adam-Hash: an adaptive and dynamic multi-resol...
research
01/26/2021

Sampling a Near Neighbor in High Dimensions – Who is the Fairest of Them All?

Similarity search is a fundamental algorithmic primitive, widely used in...
research
05/25/2020

On the Problem of p_1^-1 in Locality-Sensitive Hashing

A Locality-Sensitive Hash (LSH) function is called (r,cr,p_1,p_2)-sensit...
research
09/20/2018

Local Density Estimation in High Dimensions

An important question that arises in the study of high dimensional vecto...
research
10/22/2018

Norm-Range Partition: A Univiseral Catalyst for LSH based Maximum Inner Product Search (MIPS)

Recently, locality sensitive hashing (LSH) was shown to be effective for...
research
01/28/2020

Peeling Close to the Orientability Threshold: Spatial Coupling in Hashing-Based Data Structures

Hypergraphs with random hyperedges underlie various data structures wher...

Please sign up or login with your details

Forgot password? Click here to reset