 # Few Cuts Meet Many Point Sets

We study the problem of how to breakup many point sets in R^d into smaller parts using a few splitting (shared) hyperplanes. This problem is related to the classical Ham-Sandwich Theorem. We provide a logarithmic approximation to the optimal solution using the greedy algorithm for submodular optimization.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Let be sets of points in , not necessarily disjoint. We are interested in splitting these sets into equal parts using a minimal number of hyperplanes. For , the Ham-Sandwich Theorem states that one can bisect all the sets equally by a single hyperplane. However, for and non-degenerate inputs, this is no longer possible. In particular, the number of point sets might be significantly larger than . One way to get around this restriction is via the polynomial Ham-Sandwich Theorem [ST42], that can be used to solve the above problem. However, the cutting surface is no longer a hyperplane, but rather, as the name suggests, the zero set of a polynomial.

[r] Figure 1.1:

Here, we are interested in what can be done when the cutting is still done by more restricted entities, such as (several) hyperplanes. This is motivated by the technical difficulties in handling polynomials efficiently, and in particular their zero sets. To keep the problem feasible, we somewhat relax the problem – the requirement is no longer that each piece of is exactly half the size of the original set, but rather that it is sufficiently small.

See Figure 1 for an example. In this case, we have three point sets. We want to break the green (cross) point set into sets with at most three points, the blue (dot) point set into sets with most four points, and the red (square) point set into sets with at most two points. As the figure shows, this can be achieved using only two separating lines.

We reduce this problem to a generalized instance of partial set cover, where we allow multiple ground sets, with different demands, and show that the standard greedy algorithm for submodular optimization can be applied in this case.

#### Applications.

One natural application of this problem comes from machine learning. Given a point set in high dimensions, and a collection of features

(a feature might be a coordinate of a point, or some arbitrary function of the point itself), we say that a feature distinguishes between two points and if and

have different signs. In particular, given a collection of features, one can assign each point a vector of the signs of the features. One would like then to choose a minimal number of features, such that the set of points with the same signature is at most half the size of the original set. Naturally, one would like to apply this to several sets.

A natural scenario for such an application is in the realm of big data. Given a big data set, it needs to be divided among different computers. The fewer the features needed to get a split as described above, the faster one can decide where to send such a point. Here, we want to guarantee that each set gets reduced to at most half its size.

For the case where we require all of the points to be singletons in the induced partition of features, this can be interpreted as a non-linear dimension reduction of the input set into a hypercube, where the dimension of the hypercube is as small as possible. This work is also a natural extension of the aforementioned application, in which one would like to separate all pairs of points by a minimal number of hyperplanes, see [HJ18].

#### Some background.

The Ham-Sandwich Theorem is a well studied problem in both mathematics and computer science. Since its inception, there have been many results related to computing such cuts in higher dimensions [LMS94], as well as generalizations of the theorem [Ram96, BHJ08, ST42]. One particular generalization is the following: Given well separated convex bodies and constants , there exists a unique hyperplane that contains at least a fraction of the volume on the positive side for [BHJ08]. This result was then extended to discrete point sets under certain conditions [SZ10]. Notably, in this paper we consider the case when the number of point sets can be much larger than the base dimension .

Other generalizations include the polynomial Ham-Sandwich Theorem, in which one is interested in partitioning a point set using polynomials rather than hyperplanes [ST42, KMS12]. This generalization, and the original Ham-Sandwich Theorem has had a variety of applications in geometric range searching [Mat94, AMS13].

#### Partial set cover.

In the partial set cover problem, one is interested in covering at least a certain fraction of the elements in a set system, using as few sets as possible. We use the parallel version of this problem (with many set systems sharing sets, each with its own demand) to model our problem.

For the partial set cover problem, a approximation is well known, and follows from the greedy algorithm (see below for details). In geometric settings, Inamdar and Varadarajan [IV18] showed that partial set cover can be approximation to within , where is the approximation ratio for the set cover version of the problem. Since in many geometric instances much better approximation than are known, this results in improvement to the partial set cover version of these problems. However, it is not clear how to apply their algorithm in the parallel settings.

#### Outline.

We start by providing the necessary background in submodular function minimization needed for our main result. We then develop the approximation algorithm for the partial set cover setting. We refer the reader to Theorem 3.10 for a formal statement on the geometric problem, and the reduction to the more general setting.

#### Candid assessment & contribution.

This paper provides a rather straightforward and elegant111At least in the eyes of the authors. reduction of the problem studied to the submodular greedy algorithm. As a result, we obtain a approximation algorithm on an instance with point sets with a points overall.

In hindsight this reduction looks trivial222But then what doesn’t?, but it took the authors quite a bit of time to arrive to it. The two main contributions of this paper are (i) presenting these reductions, and more significantly (ii) introducing these family of new problems – in particular, we believe that breaking the bound should be possible for the simplest geometric settings, and we leave this as open problem for further research.

## 2 Background

#### Notations.

For a set , and an element , we denote , and . A set system is a pair , with . We refer to a set as an edge333The literature sometimes refers to such set systems as hypergraphs, and their edges as hyperedges, but this is all too hyper for us..

### 2.1 Background: Submodular minimization

For the sake of completeness, we present the analysis of the greedy algorithm for minimizing an integer valued submodular function. In this case, the task is to compute the smallest set that provides the same utility as using all the sets available.

Let be a given set system, and assume we have a monotone function . Here a function is monotone if implies that . We also assume that is submodular, that is for any , we have that

 ∀Z⊆Y⊆X∖{e}ΔZ(e)=f(Z+e)−f(Z)≥f(Y+e)−f(Y)=ΔY(e).

Consider the greedy algorithm that starts with an empty solution . In the th iteration, the algorithm picks the element that increases the value of the most, and sets . The algorithm stops when .

###### Theorem 2.1 ([Wol82]).

Given a set system , and a monotone submodular function , the greedy algorithm outputs a solution with sets, where is the size of the smallest set such that .

###### Proof:

Let be the optimal solution. Consider a current solution in iteration , and observe that As such, we have . Let be the deficiency of . For , let As such, let . We have that

 k∑j=1δj=f(Ci∪K)−f(Ci)=fmax−f(Ci)=Δi.

As such, there is an index , such that . Now, by submodularity, we have that

 f(Ci+oj)−f(Ci)≥f(Xj−1+oj)−f(Xj−1)=δj≥Δi/k.

However, the greedy algorithm adds an element that maximizes the value of , which is at least . Put differently, the added element decreases the deficiency of the current solution by a factor of at most . Therefore the deficiency in the end of the th iteration is at most This quantity is zero for .

## 3 Problems statements and reductions

### 3.1 Pcms: Partial cover for multiple sets

###### Problem 3.2 (Pcms).

The input is a set system , and a collection of ground sets, where the universe is of size . In addition, each ground has a demand denoted by , which is a non-negative integer. A valid solution for such an instance, is a collection , such that covers at least elements of , for .

###### Lemma 3.3.

Given an instance of partial cover of multiple sets (PCMS), where , is a family of ground sets, and is a family of subsets. Furthermore, each ground set of has an associated demand. Then, the greedy algorithm computes, in polynomial time, a approximation to the minimal size set that meets all the demands of the ground sets.

###### Proof:

Consider a partial solution . The service of to is the number of elements of the union of the edges of cover, formally where . Observe that , is clearly monotone, and its maximal value is . As for submodularity, consider sets , and an edge , and note that as potentially covers more new elements of when added to a smaller cover. For the given PCMS instance, for a given solution , the target function is

 f(Z)=m∑i=1fi(Z),

which is a sum of submodular functions. As such, is submodular itself. Observe that , and using the algorithm of Theorem 2.1 implies the result.

It is worth noting that one can also obtain a approximation for Problem 3.2 via LP rounding [KY05], which is particularly useful when is much smaller than . However, this does not change our final result, since the number of ground sets in our reduction is polynomial in (see Lemma 3.7).

### 3.2 Cutting a set into smaller pieces

We are given a set-system , where . A set of edges, induces a natural partition of , where two elements are in the same set of the partition if and only if and belongs to same set of edges in . Formally, , where, abusing notations, . The partition of induced by (i.e., the equivalence classes of ) is the arrangement of , denoted by . A set of is a face of . For an element , the face of that contains is denoted by .

###### Example 3.4.

For , and , we have .

###### Problem 3.5 (Reduce by half).

Given the above, find a minimal size set , such that every set in the partition is of size at most , where .

###### Problem 3.6 (Ptd: Partition to demand).

Given a set system , where , and an integral demand for each , compute a minimal size set , such that for every , we have .

Observe that the reduce by half question can be reduced to PTD immediately, by setting the demand of every vertex in the ground set to .

###### Lemma 3.7.

Given an instance of PTD, a greedy algorithm provides approximation to the optimal solution, where .

###### Proof:

Consider the complete graph , where . For every element , consider the associated cut . A set cuts if . In particular, let be the set of edges of that cuts.

Now, a set of edges meets the demand of , if the sets of cuts at least edges of . Put differently, the partial cover covers at least edges of . Thus, let be the universe set, and be the set of ground sets. Here a ground set has demand . The family of allowable sets to be used in the cover is

The triple is an instance of PCMS, and the greedy algorithm yields a approximation in this case, where , by Lemma 3.3.

### 3.3 Cutting a Ham-Sandwich into small pieces

###### Problem 3.8 (Rmc: Reduce measures via cuts).

The input is a triplet with . Here is a collection of ground sets that are not necessarily disjoint, and is a collection of edges. For every ground set , there is an associated target size . The problem is to compute a minimal set , such that, for all , we have

 ∀ψ∈A(K)|ψ∩Gi|≤μi.

The idea is to reduce the problem into “parallel” instances of PTD. The target function is the sum of the respective target functions to each instance, and is thus submodular and can be applied into the greedy algorithm.

###### Lemma 3.9.

Given an instance of RMC with and , one can compute, in polynomial time, a approximation to the smallest that satisfies the given instance.

###### Proof:

For a set , and an element , let if , and otherwise . The pair with the demand function form an instance of PTD (Problem 3.6), and its approximation algorithm Lemma 3.7 has an associated submodular function , that is non-negative, monotone, submodular and has maximum value .

Consider the submodular function , and let . Clearly, is submodular, monotone, and has maximum value . Furthermore, a subset such that is a valid solution to the given instance. As such, one can plug this into the algorithm of Theorem 2.1 and get the desired approximation.

###### Theorem 3.10.

Let be (not necessarily disjoint) point sets in , where . For each point set , we are given a constant . We would like to find a minimal set of hyperplanes such that for every face in the arrangement of , for all . One can -approximate, in time, the optimal solution.

###### Proof:

The reduction is straightforward and uses Lemma 3.9. Let the shared ground set be . Let be the set of ground sets. Finally, let be the (finite) number of combinatorially different hyperplanes. For each , we add the set to our collection of subsets . The values remain unchanged. This forms an instance of Problem 3.8, and thus we can apply Lemma 3.9 to obtain the desired separating hyperplanes.

As for the running time, computing the set system takes time by brute force. Indeed, unraveling the above reduction, the shared ground set is made of pairs of points of . Every point has up to different sets of such pairs that needs to be partially covered. Fortunately, there are only edges in the resulting set system. Evaluating the contribution of a new edge (in the set system) to the target function takes time. Since there edges in set system, it follows that evaluating all edges takes time. Finally, it is easy to verify that the algorithm performs at most iterations.

###### Remark.

No effort was made to improve the running time of the algorithm of Theorem 3.10.

## References

• [AMS13] P. K. Agarwal, J. Matoušek, and M. Sharir. On range searching with semialgebraic sets. II. SIAM Journal on Computing, 42(6):2039–2062, 2013.
• [BHJ08] I. Bárány, A. Hubard, and J. Jerónimo. Slicing convex sets and measures by a hyperplane. Discrete Comput. Geom., 39(1-3):67–75, 2008.
• [HJ18] S. Har-Peled and M. Jones. On separating points by lines. In Artur Czumaj, editor, Proc. 29th ACM-SIAM Sympos. Discrete Algs. (SODA), pages 918–932. SIAM, 2018.
• [IV18] T. Inamdar and K. R. Varadarajan. On partial covering for geometric set systems. In Bettina Speckmann and Csaba D. Tóth, editors, Proc. 34th Int. Annu. Sympos. Comput. Geom. (SoCG), volume 99 of LIPIcs, pages 47:1–47:14. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018.
• [KMS12] H. Kaplan, J. Matoušek, and M. Sharir. Simple proofs of classical theorems in discrete geometry via the guth-katz polynomial partitioning technique. Discrete Comput. Geom., 48(3):499–517, 2012.
• [KY05] S. G. Kolliopoulos and N. E. Young. Approximation algorithms for covering/packing integer programs. Journal of Computer and System Sciences, 71(4):495–505, 2005.
• [LMS94] C. Lo, J. Matoušek, and W. L. Steiger. Algorithms for ham-sandwich cuts. Discrete Comput. Geom., 11:433–452, 1994.
• [Mat94] J. Matoušek. Geometric range searching. ACM Comput. Surv., 26(4):421–461, 1994.
• [Ram96] E. A. Ramos. Equipartition of mass distributions by hyperplanes. Discrete Comput. Geom., 15(2):147–167, 1996.
• [ST42] A. H. Stone and J. W. Tukey. Generalized “sandwich” theorems. Duke Math. J., 9(2):356–359, 1942.
• [SZ10] W. Steiger and J. Zhao. Generalized ham-sandwich cuts. Discrete Comput. Geom., 44(3):535–545, 2010.
• [Wol82] L. A. Wolsey. An analysis of the greedy algorithm for the submodular set covering problem. Combinatorica, 2(4):385–393, 1982.