Fast Distance Oracles for Any Symmetric Norm
In the Distance Oracle problem, the goal is to preprocess n vectors x_1, x_2, ⋯, x_n in a d-dimensional metric space (𝕏^d, ·_l) into a cheap data structure, so that given a query vector q ∈𝕏^d and a subset S⊆ [n] of the input data points, all distances q - x_i _l for x_i∈ S can be quickly approximated (faster than the trivial ∼ d|S| query time). This primitive is a basic subroutine in machine learning, data mining and similarity search applications. In the case of ℓ_p norms, the problem is well understood, and optimal data structures are known for most values of p. Our main contribution is a fast (1+ε) distance oracle for any symmetric norm ·_l. This class includes ℓ_p norms and Orlicz norms as special cases, as well as other norms used in practice, e.g. top-k norms, max-mixture and sum-mixture of ℓ_p norms, small-support norms and the box-norm. We propose a novel data structure with Õ(n (d + mmc(l)^2 ) ) preprocessing time and space, and t_q = Õ(d + |S| ·mmc(l)^2) query time, for computing distances to a subset S of data points, where mmc(l) is a complexity-measure (concentration modulus) of the symmetric norm. When l = ℓ_p , this runtime matches the aforementioned state-of-art oracles.
READ FULL TEXT