Scalable Nearest Neighbor Search for Optimal Transport

10/09/2019
by   Yihe Dong, et al.
0

The Optimal Transport (a.k.a. Wasserstein) distance is an increasingly popular similarity measure for rich data domains, such as images or text documents. This raises the necessity for fast nearest neighbor search with respect to this distance, a problem that poses a substantial computational bottleneck for various tasks on massive datasets. In this work, we study fast tree-based approximation algorithms for searching nearest neighbors w.r.t. the Wasserstein-1 distance. A standard tree-based technique, known as Quadtree, has been previously shown to obtain good results. We introduce a variant of this algorithm, called Flowtree, and formally prove it achieves asymptotically better accuracy. Our extensive experiments, on real-world text and image datasets, show that Flowtree improves over various baselines and existing methods in either running time or accuracy. In particular, its quality of approximation is in line with previous high-accuracy methods, while its running time is much faster.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/01/2016

Monge's Optimal Transport Distance with Applications for Nearest Neighbour Image Classification

This paper focuses on a similarity measure, known as the Wasserstein dis...
research
07/17/2019

The Role of Local Intrinsic Dimensionality in Benchmarking Nearest Neighbor Search

This paper reconsiders common benchmarking approaches to nearest neighbo...
research
09/29/2018

Multilevel Optimal Transport: a Fast Approximation of Wasserstein-1 distances

We propose a fast algorithm for the calculation of the Wasserstein-1 dis...
research
06/07/2019

Optimal Transport Relaxations with Application to Wasserstein GANs

We propose a family of relaxations of the optimal transport problem whic...
research
10/17/2018

Towards Optimal Running Times for Optimal Transport

In this work, we provide faster algorithms for approximating the optimal...
research
04/23/2019

Wasserstein-Fisher-Rao Document Distance

As a fundamental problem of natural language processing, it is important...
research
03/09/2018

TRAJEDI: Trajectory Dissimilarity

The vast increase in our ability to obtain and store trajectory data nec...

Please sign up or login with your details

Forgot password? Click here to reset