Practical Data-Dependent Metric Compression with Provable Guarantees

11/05/2017
by   Piotr Indyk, et al.
0

We introduce a new distance-preserving compact representation of multi-dimensional point-sets. Given n points in a d-dimensional space where each coordinate is represented using B bits (i.e., dB bits per point), it produces a representation of size O( d (d B/ϵ) + n) bits per point from which one can approximate the distances up to a factor of 1 ±ϵ. Our algorithm almost matches the recent bound of indyk2017near while being much simpler. We compare our algorithm to Product Quantization (PQ) jegou2011product, a state of the art heuristic metric compression method. We evaluate both algorithms on several data sets: SIFT (used in jegou2011product), MNIST lecun1998mnist, New York City taxi time series guha2016robust and a synthetic one-dimensional data set embedded in a high-dimensional space. With appropriately tuned parameters, our algorithm produces representations that are comparable to or better than those produced by PQ, while having provable guarantees on its performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/30/2018

Approximate Nearest Neighbors in Limited Space

We consider the (1+ϵ)-approximate nearest neighbor search problem: given...
research
07/13/2023

Kernel t-distributed stochastic neighbor embedding

This paper presents a kernelized version of the t-SNE algorithm, capable...
research
05/27/2018

Measuring Congruence on High Dimensional Time Series

A time series is a sequence of data items; typical examples are videos, ...
research
10/07/2021

Time Series Forecasting Using Manifold Learning

We address a three-tier numerical framework based on manifold learning f...
research
12/07/2018

Approximate Calculation of Tukey's Depth and Median With High-dimensional Data

We present a new fast approximate algorithm for Tukey (halfspace) depth ...
research
12/20/2018

On Order Types of Random Point Sets

Let P be a set of n random points chosen uniformly in the unit square. I...
research
12/19/2012

A Practical Algorithm for Topic Modeling with Provable Guarantees

Topic models provide a useful method for dimensionality reduction and ex...

Please sign up or login with your details

Forgot password? Click here to reset