Low-Complexity Data-Parallel Earth Mover's Distance Approximations

12/05/2018
by   Kubilay Atasu, et al.
0

The Earth Mover's Distance (EMD) is a state-of-the art metric for comparing probability distributions. The high distinguishability offered by the EMD comes at a high cost in computational complexity. Therefore, linear-complexity approximation algorithms have been proposed to improve its scalability. However, these algorithms are either limited to vector spaces with only a few dimensions or require the probability distributions to populate the vector space sparsely. We propose novel approximation algorithms that overcome both of these limitations, yet still achieve linear time complexity. All our algorithms are data parallel, and therefore, we can take advantage of massively parallel computing engines, such as Graphics Processing Units (GPUs). The experiments on MNIST images show that the new algorithms can perform a billion distance computations in less than a minute using a single GPU. On the popular text-based 20 Newsgroups dataset, the new algorithms are four orders of magnitude faster than the state-of-the-art FastEMD library and match its search accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/20/2017

Linear-Complexity Relaxed Word Mover's Distance with GPU Acceleration

The amount of unstructured text-based data is growing every day. Queryin...
research
07/14/2023

Fast Algorithms for a New Relaxation of Optimal Transport

We introduce a new class of objectives for optimal transport computation...
research
09/24/2019

Conditional Hardness of Earth Mover Distance

The Earth Mover Distance (EMD) between two sets of points A, B ⊆R^d with...
research
07/06/2023

Probability Metrics for Tropical Spaces of Different Dimensions

The problem of comparing probability distributions is at the heart of ma...
research
04/16/2021

Approximating the Earth Mover's Distance between sets of geometric objects

Given two distributions P and S of equal total mass, the Earth Mover's D...
research
01/02/2019

Massively Parallel Construction of Radix Tree Forests for the Efficient Sampling of Discrete Probability Distributions

We compare different methods for sampling from discrete probability dist...
research
02/04/2017

Cluster-based Kriging Approximation Algorithms for Complexity Reduction

Kriging or Gaussian Process Regression is applied in many fields as a no...

Please sign up or login with your details

Forgot password? Click here to reset