1 Similar Elements
Let be a (possibly infinite) set and be a metric on . Let be finite subsets of . The goal of the similar elements problem is to select an element from each set such that the selected elements are close to each other under the metric . One motivation is for discovering something in common among the sets even when they have empty intersection.
We formalize the problem as the minimization of the sum of pairwise distances among selected elements. Let with . Define the similar elements objective as,
(1) |
Let be an optimal solution for the similar elements problem.
Optimizing appears to be difficult, but we can define easier problems if we ignore some of the pairwise distances in the objective. In particular we define different “star-graph” objective functions as follows. For each define the objective to account only for the terms in involving ,
(2) |
Let be an optimal solution for the optimization problem defined by . We can compute efficiently using a simple form of dynamic programming, by first computing and then computing for .
(3) |
(4) |
Each of the “star-graph” objective functions leads to a possible solution. We then select from among the solutions as follows,
(5) | |||||
(6) |
Theorem 1.
The algorithm described above finds a 2-approximate solution for the similar elements problem. That is,
Proof.
First note that,
Since the minimum of a set of values is at most the average, and minimizes ,
By the triangle inequality we have
Therefore
∎
To analyze the running time of the algorithm we assume the distances between pairs of elements in are either pre-computed and given as part of the input, or they can each be computed in time.
Let . The first stage of the algorithm involves optimization problems that can be solved in time each. The second stage of the algorithm involves selecting one of the solutions, and takes time.
Remark 2.
If each of the sets has size at most the running time of the approximation algorithm for the similar elements problem is .
The bottleneck of the algorithm is the evaluation of the minimizations over in (3) and (4). This computation is equivalent to a nearest-neighbor computation, where we want to find a point from a set that is closest to a query point . When the nearest-neighbor computation can be done efficiently (with an appropriate data structure) the running time of the similar elements approximation algorithm can be reduced.
2 Metric Labeling on Complete Graphs
Let be an undirected simple graph on nodes . Let be a finite set of labels with and be a metric on . For let be a non-negative function mapping labels to real values. The unweighted metric labeling problem on is to find a labeling minimizing
(7) |
Let . This optimization problem can be solved in polynomial time using dynamic programming if is a tree. Here we consider the case when is the complete graph and give an efficient 2-approximation algorithm based on the solution of several metric labeling problems on star graphs.
For each define a different objective function, , corresponding to a metric labeling problem on a star graph with vertex set rooted at ,
(8) |
Let . We can solve this optimization problem in time using a simple form of dynamic programming. First compute an optimal label for the root vertex using one step of dynamic programming,
(9) |
Then compute for ,
(10) |
Optimizing each separately leads to possible solutions , and we select one of them as follows,
(11) | |||||
(12) |
Theorem 3.
The algorithm described above finds a 2-approximate solution for the metric labeling problem on a complete graph. That is,
Proof.
First note that,
Since the minimum of a set of values is at most the average, and minimizes ,
Since is a metric and is non-negative,
Therefore
∎
The first stage of the algorithm involves optimization problems that can be solved in time each. The second stage involves selecting one of the solutions, and takes time.
Remark 4.
The running time of the approximation algorithm for the metric labeling problem on complete graphs is .
Acknowledgments
We thank Caroline Klivans, Sarah Sachs, Anna Grim, Robert Kleinberg and Yang Yuan for helpful discussions about the contents of this report. This material is based upon work supported by the National Science Foundation under Grant No. 1447413.
References
- [1] Dan Gusfield. Efficient methods for multiple sequence alignment with guaranteed error bounds. Bulletin of Mathematical Biology, 55(1):141–154, 1993.
- [2] Jon Kleinberg and Eva Tardos. Approximation algorithms for classification problems with pairwise relationships: Metric labeling and markov random fields. Journal of the ACM, 49(5):616–639, 2002.
-
[3]
Oded Maron and Aparna Lakshmi Ratan.
Multiple-instance learning for natural scene classification.
In International Conference on Machine Learning, volume 98, pages 341–349, 1998. - [4] Sarah Sachs. Similar-part approximation using invariant feature descriptors. Undergraduate Honors Thesis, Brown University, 2016.