Kernel Density Estimation through Density Constrained Near Neighbor Search

11/13/2020
by   Moses Charikar, et al.
0

In this paper we revisit the kernel density estimation problem: given a kernel K(x, y) and a dataset of n points in high dimensional Euclidean space, prepare a data structure that can quickly output, given a query q, a (1+ϵ)-approximation to μ:=1/|P|∑_p∈ P K(p, q). First, we give a single data structure based on classical near neighbor search techniques that improves upon or essentially matches the query time and space complexity for all radial kernels considered in the literature so far. We then show how to improve both the query complexity and runtime by using recent advances in data-dependent near neighbor search. We achieve our results by giving a new implementation of the natural importance sampling scheme. Unlike previous approaches, our algorithm first samples the dataset uniformly (considering a geometric sequence of sampling rates), and then uses existing approximate near neighbor search techniques on the resulting smaller dataset to retrieve the sampled points that lie at an appropriate distance from the query. We show that the resulting sampled dataset has strong geometric structure, making approximate near neighbor search return the required samples much more efficiently than for worst case datasets of the same size. As an example application, we show that this approach yields a data structure that achieves query time μ^-(1+o(1))/4 and space complexity μ^-(1+o(1)) for the Gaussian kernel. Our data dependent approach achieves query time μ^-0.173-o(1) and space μ^-(1+o(1)) for the Gaussian kernel. The data dependent analysis relies on new techniques for tracking the geometric structure of the input datasets in a recursive hashing process that we hope will be of interest in other applications in near neighbor search.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2019

Near Neighbor: Who is the Fairest of Them All?

In this work we study a fair variant of the near neighbor problem. Namel...
research
08/21/2020

(2+ε)-ANN for time series under the Fréchet distance

We study approximate-near-neighbor data structures for time series under...
research
07/19/2018

Optimal Las Vegas Approximate Near Neighbors in ℓ_p

We show that approximate near neighbor search in high dimensions can be ...
research
06/05/2019

Fair Near Neighbor Search: Independent Range Sampling in High Dimensions

Similarity search is a fundamental algorithmic primitive, widely used in...
research
06/22/2021

Practical Near Neighbor Search via Group Testing

We present a new algorithm for the approximate near neighbor problem tha...
research
07/21/2023

Subset Sampling and Its Extensions

This paper studies the subset sampling problem. The input is a set 𝒮 of ...
research
02/20/2023

Fully Dynamic k-Center in Low Dimensions via Approximate Furthest Neighbors

Let P be a set of points in some metric space. The approximate furthest ...

Please sign up or login with your details

Forgot password? Click here to reset