A Data Dependent Algorithm for Querying Earth Mover's Distance with Low Doubling Dimension

02/27/2020
by   Hu Ding, et al.
0

In this paper, we consider the following query problem: given two weighted point sets A and B in the Euclidean space R^d, we want to quickly determine that whether their earth mover's distance (EMD) is larger or smaller than a pre-specified threshold T≥ 0. The problem finds a number of important applications in the fields of machine learning and data mining. In particular, we assume that the dimensionality d is not fixed and the sizes |A| and |B| are large. Therefore, most of existing EMD algorithms are not quite efficient to solve this problem. Here, we consider the problem under the assumption that A and B have low doubling dimension, which is common for high-dimensional data in real world. Inspired by the geometric method net tree, we propose a novel "data-dependent" algorithm to solve this problem efficiently. We also study the performance of our method on real datasets, and the experimental results suggest that our method can save a large amount of running time comparing with existing EMD algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2020

A Data-Dependent Algorithm for Querying Earth Mover's Distance with Low Doubling Dimensions

In this paper, we consider the following query problem: given two weight...
research
02/27/2020

On Metric DBSCAN with Low Doubling Dimension

The density based clustering method Density-Based Spatial Clustering of ...
research
11/19/2018

On Geometric Alignment in Low Doubling Dimension

In real-world, many problems can be formulated as the alignment between ...
research
04/25/2018

On Geometric Prototype And Applications

In this paper, we propose to study a new geometric optimization problem ...
research
09/07/2022

A Data-dependent Approach for High Dimensional (Robust) Wasserstein Alignment

Many real-world problems can be formulated as the alignment between two ...
research
03/12/2018

GPU Accelerated Self-join for the Distance Similarity Metric

The self-join finds all objects in a dataset within a threshold of each ...
research
02/01/2016

Fast inference of ill-posed problems within a convex space

In multiple scientific and technological applications we face the proble...

Please sign up or login with your details

Forgot password? Click here to reset