If it ain't broke, don't fix it: Sparse metric repair

10/29/2017
by   Anna C. Gilbert, et al.
0

Many modern data-intensive computational problems either require, or benefit from distance or similarity data that adhere to a metric. The algorithms run faster or have better performance guarantees. Unfortunately, in real applications, the data are messy and values are noisy. The distances between the data points are far from satisfying a metric. Indeed, there are a number of different algorithms for finding the closest set of distances to the given ones that also satisfy a metric (sometimes with the extra condition of being Euclidean). These algorithms can have unintended consequences, they can change a large number of the original data points, and alter many other features of the data. The goal of sparse metric repair is to make as few changes as possible to the original data set or underlying distances so as to ensure the resulting distances satisfy the properties of a metric. In other words, we seek to minimize the sparsity (or the ℓ_0 "norm") of the changes we make to the distances subject to the new distances satisfying a metric. We give three different combinatorial algorithms to repair a metric sparsely. In one setting the algorithm is guaranteed to return the sparsest solution and in the other settings, the algorithms repair the metric. Without prior information, the algorithms run in time proportional to the cube of the number of input data points and, with prior information we can reduce the running time considerably.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/19/2018

Generalized Metric Repair on Graphs

Many modern data analysis algorithms either assume that or are considera...
research
08/14/2022

Exact Exponential Algorithms for Clustering Problems

In this paper we initiate a systematic study of exact algorithms for wel...
research
11/07/2022

Metricizing the Euclidean Space towards Desired Distance Relations in Point Clouds

Given a set of points in the Euclidean space ℝ^ℓ with ℓ>1, the pairwise ...
research
06/17/2022

Distances for Comparing Multisets and Sequences

Measuring the distance between data points is fundamental to many statis...
research
05/30/2022

Fast Distance Oracles for Any Symmetric Norm

In the Distance Oracle problem, the goal is to preprocess n vectors x_1,...
research
08/09/2014

Efficient Clustering with Limited Distance Information

Given a point set S and an unknown metric d on S, we study the problem o...
research
04/30/2018

A Data-Dependent Distance for Regression

We develop a new data-dependent distance for regression problems to comp...

Please sign up or login with your details

Forgot password? Click here to reset