    # Similar Elements and Metric Labeling on Complete Graphs

We consider a problem that involves finding similar elements in a collection of sets. The problem is motivated by applications in machine learning and pattern recognition. We formulate the similar elements problem as an optimization and give an efficient approximation algorithm that finds a solution within a factor of 2 of the optimal. The similar elements problem is a special case of the metric labeling problem and we also give an efficient 2-approximation algorithm for the metric labeling problem on complete graphs.

## Authors

07/02/2020

### A (Slightly) Improved Approximation Algorithm for Metric TSP

For some ϵ > 10^-36 we give a 3/2-ϵ approximation algorithm for metric T...
04/06/2020

### A historical note on the 3/2-approximation algorithm for the metric traveling salesman problem

One of the most fundamental results in combinatorial optimization is the...
03/16/2018

### A constant-ratio approximation algorithm for a class of hub-and-spoke network design problems and metric labeling problems: star metric case

Transportation networks frequently employ hub-and-spoke network architec...
04/26/2020

### An Extension of Plücker Relations with Applications to Subdeterminant Maximization

Given a matrix A and k≥ 0, we study the problem of finding the k× k subm...
02/04/2020

### Separating Variables in Bivariate Polynomial Ideals

We present an algorithm which for any given ideal I⊆𝕂 [x,y] finds all el...
09/05/2019

### On ultrametric 1-median selection

Consider the problem of finding a point in an ultrametric space with the...
03/16/2016

### Image Labeling by Assignment

We introduce a novel geometric approach to the image labeling problem. A...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Similar Elements

Let be a (possibly infinite) set and be a metric on . Let be finite subsets of . The goal of the similar elements problem is to select an element from each set such that the selected elements are close to each other under the metric . One motivation is for discovering something in common among the sets even when they have empty intersection.

We formalize the problem as the minimization of the sum of pairwise distances among selected elements. Let with . Define the similar elements objective as,

 c(x)=∑1≤i,j≤nd(xi,xj). (1)

Let be an optimal solution for the similar elements problem.

Optimizing appears to be difficult, but we can define easier problems if we ignore some of the pairwise distances in the objective. In particular we define different “star-graph” objective functions as follows. For each define the objective to account only for the terms in involving ,

 cr(x)=∑j≠rd(xr,xj). (2)

Let be an optimal solution for the optimization problem defined by . We can compute efficiently using a simple form of dynamic programming, by first computing and then computing for .

 xrr=argminxr∈Sr∑j≠rminxj∈Sjd(xr,xj), (3)
 xrj=argminxj∈Sjd(xrr,xj). (4)

Each of the “star-graph” objective functions leads to a possible solution. We then select from among the solutions as follows,

 ^r = argmin1≤r≤ncr(xr), (5) ^x = xr. (6)
###### Theorem 1.

The algorithm described above finds a 2-approximate solution for the similar elements problem. That is,

 c(^x)≤2c(x∗).
###### Proof.

First note that,

 c(x)=n∑r=1cr(x).

Since the minimum of a set of values is at most the average, and minimizes ,

 min1≤r≤ncr(xr)≤1nn∑r=1cr(xr)≤1nn∑r=1cr(x∗)=1nc(x∗).

By the triangle inequality we have

 c(x)=∑1≤i,j≤nd(xi,xj)≤∑1≤i,j≤n(d(xi,xr)+d(xr,xj))=2nn∑l=1d(xr,xl)=2ncr(x).

Therefore

 c(^x)≤2nc^r(^x)=2nmin1≤r≤ncr(xr)≤2c(x∗).

To analyze the running time of the algorithm we assume the distances between pairs of elements in are either pre-computed and given as part of the input, or they can each be computed in time.

Let . The first stage of the algorithm involves optimization problems that can be solved in time each. The second stage of the algorithm involves selecting one of the solutions, and takes time.

###### Remark 2.

If each of the sets has size at most the running time of the approximation algorithm for the similar elements problem is .

The bottleneck of the algorithm is the evaluation of the minimizations over in (3) and (4). This computation is equivalent to a nearest-neighbor computation, where we want to find a point from a set that is closest to a query point . When the nearest-neighbor computation can be done efficiently (with an appropriate data structure) the running time of the similar elements approximation algorithm can be reduced.

## 2 Metric Labeling on Complete Graphs

Let be an undirected simple graph on nodes . Let be a finite set of labels with and be a metric on . For let be a non-negative function mapping labels to real values. The unweighted metric labeling problem on is to find a labeling minimizing

 c(x)=∑i∈Vmi(xi)+∑{i,j}∈Ed(xi,xj). (7)

Let . This optimization problem can be solved in polynomial time using dynamic programming if is a tree. Here we consider the case when is the complete graph and give an efficient 2-approximation algorithm based on the solution of several metric labeling problems on star graphs.

For each define a different objective function, , corresponding to a metric labeling problem on a star graph with vertex set rooted at ,

 cr(x)=∑i∈Vmi(xi)n+∑j∈V∖{r}d(xr,xj)2. (8)

Let . We can solve this optimization problem in time using a simple form of dynamic programming. First compute an optimal label for the root vertex using one step of dynamic programming,

 xrr=argminxr∈L⎛⎝mr(xr)n+∑j∈V∖{r}minxj∈L(mj(xj)n+d(xr,xj)2)⎞⎠. (9)

Then compute for ,

 xrj=argminxj∈L(mj(xj)n+d(xrr,xj)2). (10)

Optimizing each separately leads to possible solutions , and we select one of them as follows,

 ^r = argminr∈Vcr(xr), (11) ^x = xr. (12)
###### Theorem 3.

The algorithm described above finds a 2-approximate solution for the metric labeling problem on a complete graph. That is,

 c(^x)≤2c(x∗).
###### Proof.

First note that,

 c(x)=n∑r=1cr(x).

Since the minimum of a set of values is at most the average, and minimizes ,

 min1≤r≤ncr(xr)≤1nn∑r=1cr(xr)≤1nn∑r=1cr(x∗)=1nc(x∗).

Since is a metric and is non-negative,

 c(x) = ∑i∈Vmi(xi)+∑{i,j}∈Ed(xi,xj) = ∑i∈Vmi(xi)+∑(i,j)∈V2d(xi,xj)2 ≤ ∑i∈Vmi(xi)+∑(i,j)∈V2(d(xi,xr)2+d(xr,xj)2) = ∑i∈Vmi(xi)+2n∑l∈V∖{r}d(xr,xl)2 ≤ 2n∑i∈Vmi(xi)n+2n∑l∈V∖{r}d(xr,xl)2 = 2ncr(x).

Therefore

 c(^x)≤2nc^r(^x)=2nmin1≤r≤ncr(xr)≤2c(x∗).

The first stage of the algorithm involves optimization problems that can be solved in time each. The second stage involves selecting one of the solutions, and takes time.

###### Remark 4.

The running time of the approximation algorithm for the metric labeling problem on complete graphs is .

### Acknowledgments

We thank Caroline Klivans, Sarah Sachs, Anna Grim, Robert Kleinberg and Yang Yuan for helpful discussions about the contents of this report. This material is based upon work supported by the National Science Foundation under Grant No. 1447413.

## References

•  Dan Gusfield. Efficient methods for multiple sequence alignment with guaranteed error bounds. Bulletin of Mathematical Biology, 55(1):141–154, 1993.
•  Jon Kleinberg and Eva Tardos. Approximation algorithms for classification problems with pairwise relationships: Metric labeling and markov random fields. Journal of the ACM, 49(5):616–639, 2002.
•  Oded Maron and Aparna Lakshmi Ratan.

Multiple-instance learning for natural scene classification.

In International Conference on Machine Learning, volume 98, pages 341–349, 1998.
•  Sarah Sachs. Similar-part approximation using invariant feature descriptors. Undergraduate Honors Thesis, Brown University, 2016.