Non-Uniform k-Center and Greedy Clustering

In the Non-Uniform k-Center problem, a generalization of the famous k-center clustering problem, we want to cover the given set of points in a metric space by finding a placement of balls with specified radii. In t-NUkC Problem, we assume that the number of distinct radii is equal to t, and we are allowed to use k_i balls of radius r_i, for 1 ≤ i ≤ t. This problem was introduced by Chakrabarty et al. [ACM Trans. Alg. 16(4):46:1-46:19], who showed that a constant approximation for t-NUkC is not possible if t is unbounded. On the other hand, they gave a bicriteria approximation that violates the number of allowed balls as well as the given radii by a constant factor. They also conjectured that a constant approximation for t-NUkC should be possible if t is a fixed constant. Since then, there has been steady progress towards resolving this conjecture – currently, a constant approximation for 3-NUkC is known via the results of Chakrabarty and Negahbani [IPCO 2021], and Jia et al. [To appear in SOSA 2022]. We push the horizon by giving an O(1)-approximation for the Non-Uniform k-Center for 4 distinct types of radii. Our result is obtained via a novel combination of tools and techniques from the k-center literature, which also demonstrates that the different generalizations of k-center involving non-uniform radii, and multiple coverage constraints (i.e., colorful k-center), are closely interlinked with each other. We hope that our ideas will contribute towards a deeper understanding of the t-NUkC problem, eventually bringing us closer to the resolution of the CGK conjecture.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

10/06/2021

Towards Non-Uniform k-Center with Constant Types of Radii

In the Non-Uniform k-Center problem we need to cover a finite metric spa...
04/27/2020

On Perturbation Resilience of Non-Uniform k-Center

The Non-Uniform k-center (NUkC) problem has recently been formulated by ...
02/23/2021

Robust k-Center with Two Types of Radii

In the non-uniform k-center problem, the objective is to cover points in...
03/04/2021

Revisiting Priority k-Center: Fairness and Outliers

In the Priority k-Center problem, the input consists of a metric space (...
07/21/2019

A Constant Approximation for Colorful k-Center

In this paper, we consider the colorful k-center problem, which is a gen...
02/28/2021

Is Simple Uniform Sampling Efficient for Center-Based Clustering With Outliers: When and Why?

Clustering has many important applications in computer science, but real...
01/19/2019

Approximation Algorithms for the A Priori TravelingRepairman

We consider the a priori traveling repairman problem, which is a stochas...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The -center problem is one of the most fundamental problems in clustering. The input to the -center problem consists of a finite metric space , where is a set of points, and is the associated distance function satisfying triangle inequality. We are also given a parameter , where . A solution to the -center problem consists of a set of size at most , and the cost of this solution is , i.e., the maximum distance of a point to its nearest center in . Alternatively, a solution can be thought of as a set of balls of radius , centered around points in , that covers the entire set of points . The goal is to find a solution of smallest radius. We say that a solution is an -approximation, if the cost of is at most times the optimal radius. Several -approximations are known for the -center problem [hochbaumS1985best, hochbaumM1985approximation]. A simple reduction from the Minimum Dominating Set problem shows that the -center problem is NP-hard. In fact, the same reduction also shows that it is NP-hard to get a -approximation for any .

Several generalizations of the vanilla -center problem have been considered in the literature, given its fundamental nature in the domain of clustering and approximation algorithms. One natural generalization is the Robust -center or -center with outliers problem, where we are additionally given a parameter , and the goal is to find a solution that covers at least points of . Note that the remaining at most

points can be thought of as outliers with respect to the clustering computed. Charikar et al. 

[charikar2001algorithms], who introduced this problem, showed that a simple greedy algorithm gives a -approximation for the problem. Subsequently, the approximation guarantee was improved by [CGK20, harris2017lottery], who gave a -approximation, which is optimal in light of the aforementioned -hardness result.

The focus of our paper is the Non-Uniform -Center (NUC), which was introduced by Chakrabarty et al. [CGK20]. A formal definition follows.

Definition 1 (-NuC).

The input is an instance , where , and the are positive integers. The goal is to find sets for , such that , and the union of balls of radius around the centers in , over , covers the entire set of points . The objective is to minimize the value of the dilation factor .

In the Robust -NUC problem, we are required to cover at least points of using such a solution. We note that the special case of (Robust) -NUC with corresponds to the (Robust) -center problem. Chakrabarty et al. [CGK20] gave a bicriteria approximation for -NUC for arbitrary , i.e., they give a solution containing balls of radius for . They also give a -approximation for -NUC. Furthermore, they conjectured that there exists a polynomial time -approximation for -NUC for constant . Subsequently, Chakrabarty and Negahbani [ChakrabartyN21] made some progress by giving a -approximation for Robust -NUC. Very recently, Jia et al. [jia2021towards] showed an approximate equivalence between -NUC and Robust -NUC, thereby observing that the previous result of [ChakrabartyN21] readily implies a -approximation for -NUC. We note that the techniques from Inamdar and Varadarajan [inamdar2020capacitated] implicitly give an -approximation for -NUC for any , in time, where . That is, one gets an FPT approximation. Finally, we also note that Bandyapadhyay [Bandyapadhyay20nukc] gave an exact algorithm for perturbation resilient instances of NUC in polynomial time.

Another related variant of -center is the Colorful -center problem. Here, the set of points is partitioned into color classes, . Each color class has a coverage requirement , and the goal is to find a set of balls of smallest radius that satisfy the coverage requirements of all the color classes. Note that this is a generalization of Robust -center to multiple types of coverage constraints. Bandyapadhyay et al. [Bandyapadhyay0P19] introduced this problem, and gave a pseudo-approximation, i.e., their algorithm returns an -approximate solution using at most centers. Furthermore, they managed to improve this to a true -approximation in the Euclidean plane for constant number of color classes. Subsequently, Jia et al. [JiaSS20] and Anegg et al. [AneggAKZ20] independently gave (true) and -approximations respectively for the Colorful -center (with constant ) in arbitrary metrics.

Our Results and Techniques.

Our main result is an -approximation for -NUC. We obtain this result via a sequence of reductions; some of these reductions are from prior work while some are developed here and constitute our main contribution. Along the way, we combine various tools and techniques from the aforementioned literature of Robust, Colorful, and Non-Uniform versions of -center.

First, we reduce the -NUC problem to the Robust -NUC problem, following Jia et al. [jia2021towards]. Next, we reduce the Robust -NUC to well-separated Robust -NUC, by adapting the approach of Chakrabarty and Negahbani [ChakrabartyN21].111In this discussion, “reduction” refers to a polynomial time (possibly Turing) reduction from problem to problem , such that (i) a feasible instance of yields (possibly polynomially many) instance(s) of , and (ii) a constant approximation for implies a constant approximation for . In a well-separated instance, we are given a set of potential centers for the balls of radius , such that the distance between any two of these centers is at least , for a parameter .

Before describing how to solve Well-Separated Robust -NUC, we give a sequence of reductions, which constitute the technical core of our paper. First, we show that any instance of Robust -NUC can be transformed to an instance of “Colorful” -NUC, where we want to cover certain number of red and blue points using the specified number of balls of distinct radii. Thus, this reduction reduces the number of radii classes from to at the expense of increasing the number of coverage constraints from to . In our next reduction, we show that Colorful -NUC can be reduced to Colorful -NUC with an additional “self-coverage” property, i.e., the radius can be assumed to be . Just like the aforementioned reduction from [jia2021towards], these two reductions are generic, and hold for any value of . These reductions crucially appeal to the classical greedy algorithm and its analysis from Charikar et al. [charikar2001algorithms], which is a tool that has been not been exploited in the NUC literature thus far. We believe that these connections between Colorful and Robust versions of NUC are interesting in their own right, and may be helpful toward obtaining a true -approximation for -NUC for fixed .

We apply these two new reductions to transform Well-Separated Robust -NUC to Well-Separated Colorful -NUC, with . The latter problem can be solved in polynomial time using dynamic programming in a straightforward way. Since each of our reductions preserves the approximation factor up to a constant, this implies an -approximation for -NUC.

Our overall algorithm for -NUC is combinatorial, except for the step where we reduce Robust -NUC to Well-Separated Robust -NUC using the round-or-cut approach of [ChakrabartyN21]. Thus, we avoid an additional “inner loop” of round-or-cut that is employed in recent work [ChakrabartyN21, jia2021towards].222A by-product of one of our reductions is a purely combinatorial approximation algorithm for colorful -center, in contrast with the LP-based approaches in [Bandyapadhyay0P19, AneggAKZ20, JiaSS20].

2 Definitions, Main Result, and Greedy Clustering

2.1 Problem Definitions

In the following, we set up the basic notation and define the problems we will consider in the paper. We consider a finite metric space , where is a finite set of (usually ) points, and is a distance function satisfying triangle inequality. If is a subset of , then by slightly abusing the notation, we use to denote the metric space where the distance function is restricted to the points of . Let , , and . Then, we use , and denote by the ball of radius centered at , i.e., . We say that a ball covers a point iff ; a set of balls (resp. a tuple of sets of balls ) covers if there exists a ball in that covers (resp.  that covers ). Analogously, a set of points is covered iff every point in is covered. For a function or , and , we define .

Definition 2 (Decision Version of -NuC).

The input is an instance , where , and each is a non-negative integer. The goal is to determine whether there exists a solution , where for each , is a set with at most balls of radius , that covers the entire set of points . Such a solution is called a feasible solution, and if the instance has a feasible solution, then is said to be feasible.
An algorithm is said to be an -approximation algorithm (with ), if given a feasible instance , it returns a solution , where for each , is a collection of at most balls of radius , such that the solution covers .

Next, we define the robust version of -NUC.

Definition 3 (Decision Version of Robust -NuC).

The input is an instance . The setup is the same as in -NUC, except for the following: is a weight function, and is a parameter. The goal is to determine whether there exists a feasible solution, i.e., of appropriate sizes and radii (as defined above), such that the total weight of the points covered is at least . An -approximate solution covers points of weight at least while using at most balls of radius for each .

We will frequently consider the unweighted version of Robust -NUC, i.e., where the weight of every point in is unit. Let denote this unit weight function. Now we define the Colorful -NUC problem, which generalizes Robust -NUC.

Definition 4 (Decision Version of Colorful -NuC).

The input is an instance . The setup is similar as in Robust -NUC, except that we have two weight functions (corresponding to red and blue weight respectively). A feasible solution covers a set of points with red weight at least , and blue weight at least . The notion of approximation is the same as above.

We note that the preceding definition naturally extends to an arbitrary number of colors (i.e., different weight functions over ). However, we will not need that level of generality in this paper.

2.2 Main Algorithm for -NuC

Let be the given instance of -NUC, which we assume is feasible. First, using the reduction Section A, we reduce it to an instance of Robust -NUC. Recall that Lemma 8 implies that is feasible, and furthermore an -approximation for implies an -approximation for .

Next, we use the round-or-cut framework methodology from [ChakrabartyN21] on the instance , as described in Section 6. Essentially, this is a Turing reduction from Robust -NUC to (polynomially many instances of) Well-Separated Robust -NUC. In a well-separated instance, we are given a set of potential centers for the balls of radius , such that the distance between any two potential centers is at least . At a high level, this reduction uses the ellipsoid algorithm, and each iteration of ellipsoid algorithm returns a candidate LP solution such that, (1) it can be rounded to obtain an -approximate solution for , or (2) One can obtain polynomially many instances of well-separated Robust -NU

C, at least one of which is feasible, or (3) If none of the obtained instances is feasible, then one an obtain a hyperplane separating the LP solution from the integer hull of coverages.

Solving a Well-Separated Instance.

For the sake of simplicity let be one of the instances of Well-Separated Robust -NUC, along with a well-separated set that is a candidate set for the centers of balls of radius . Furthermore, let us assume that is feasible. First, the reduction in Section 3, given the instance , produces instances of Colorful -NUC, such that at least one of the instances is feasible. Then, we apply the reduction from Section 4 on each of these instances to ensure the self-coverage property, i.e., we obtain an instance of Colorful -NUC with , and . Finally, assuming that the resulting instance is feasible, it is possible to find a feasible solution using dynamic programming, using the algorithm from Section 5. This algorithm supposes that the instance is Well-Separated w.r.t. a smaller separation factor of . We argue in the next paragraph that this property holds in each each of the instances .

In order to show that the set well-separated w.r.t. the new top level radius , we need to show that , i.e., for some sufficiently large constant . This assumption is without loss of generality, since, if two consecutive radii classes are within a factor, it is possible to combine them into a single radius class, at the expense of an factor in the approximation guarantee.

Assuming the instance is feasible, a feasible solution to an instance can be mapped back to an -approximate solution to , and then to , since each reduction preserves the approximation guarantee up to an factor.

Theorem 1.

There exists a polynomial time -approximation algorithm for -NUC.

We have overviewed how the various sections of the paper come together in deriving Theorem 1. Before proceeding to these sections, we describe a greedy clustering procedure that we need.

2.3 Greedy Clustering

Assume we are given (i) a metric space , where is finite, (ii) a radius , (iii) an expansion parameter , (iv) a subset and a weight function . The weight can be thought of as the multiplicity of , or how many points are co-located at . We describe a greedy clustering procedure, from Charikar et al. [charikar2001algorithms], that is used to partition the point set into clusters, each of which is contained in a ball of radius . This clustering procedure, together with its properties, is a crucial ingredient of our approach.

1:We require that
2:Let ,
3:while  do
4:     
5:     ;
6:     
7:      We will refer to as a mega-point with cluster of weight
8:end while
9:return
Algorithm 1 GreedyClustering()

In line 4, we only consider such that . Notice that it is possible that if for each . Furthermore, notice that we do not require that for it to be an eligible point in line 4.

We summarize some of the key properties of this algorithm in the following observations.

Observation 1.
  1. For any , ,

  2. Point belongs to the cluster , such that is the first among all satisfying .

  3. The sets partition , which implies that

  4. , where for any .

  5. If and are the points added to in iterations , then .

  6. For any two distinct , .

Proof.

The first five properties are immediate from the description of the algorithm. Now, we prove the sixth property. Suppose for contradiction that there exist with , and without loss of generality, was added to before . Then, note that at the end of this iteration, . Therefore, will subsequently never be a candidate for being added to in line 4. ∎

A key property of this greedy clustering, established by Charikar et al. [charikar2001algorithms], is that for any balls of radius , the weight of the points in the first clusters is at least as large as the weight of the points covered by the balls.

Lemma 1.

Suppose that the parameter used in Algorithm 1 is at least . Let be any collecion of balls of radius , each centered at a point in . Let consist of the first points of chosen by the algorithm, where . We have

The equality follows from the definition of and the fact that the clusters partition , as stated in Obervation 1.

3 From Robust -NuC to Colorful -NuC

Let be an instance of Robust -NUC. The reduction to Colorful -NuKC consists of two phases. In the first phase, we use Algorithm 1 to reduce the instance to an instance focused on the cluster centers output by the greedy algorithm. A key property of this reduction is that we may set in the instance – each ball at level is allowed to cover at most one point.

In the second phase, we transform to instances of Colorful -NUC. Assuming there exists a feasible solution for , at least one of the instances of Colorful -NUC has a feasible solution, and any approximate solution to can be used to obtain an approximate solution to (and thus to ).

Phase 1.

Let be an instance of Robust -NUC. We call the algorithm GreedyClustering, and obtain a set of points with the corresponding clusters for . The greedy algorithm also returns a weight for each . Let us number the points of as , where is the iteration in which was added to the set by algorithm GreedyClustering. This gives an ordering of the points in . Note that for .

We define a weight function . Let for and for . Note that for , . Thus, for each , we are moving the weight from points in cluster to the cluster center . Clearly, .

The output of Phase 1 is the instance of -Robust-NuKC, where . Note that in the instance , we have , whereas the other radii in have been increased by an additive factor of . The following claim relates instances and .

Lemma 2.

(a) If instance has a feasible solution, then so does the instance . (b) Given a solution for that uses at most balls of radius for every , we can obtain a solution for that uses at most balls of radius at most for .

Proof.

We begin with part (b). For each ball in that is part of the solution , we replace it with the ball to obtain a solution for . That is, we expand each ball by an additive . If covers , then covers , and . Let denote the points covered by . The weight of the points covered by is at least

We now establish (a). Fix a feasible solution to that covers -weight at least , where is a set of at most balls of radius , for . Let be the set of points such that some point in is covered by a ball in .

Now let be the set of points , such that any point in is either covered by a ball from , or is an outlier. Let for . Note that .

Note that in the sequence , the points of and may appear in an interleaved fashion. Let be the subsequence restricted to the points in . In the following lemma, we argue that the first points in this subsequence are sufficient to replace the balls in . Let .

Lemma 3.

There exists a subset of size at most such that

Proof.

Let . That is, consists of the first points of picked by the greedy algorithm. Recall that , and thus for , it holds that .

Now imagine calling the algorithm GreedyClustering. Observe that in the iteration , this algorithm will select point (as defined above) in Line 3, and the corresponding cluster and its weight will be and – exactly as in the execution of GreedyClustering. That is, the algorithm GreedyClustering will output and the clusters for each .

Now, consists of a set of balls of radius . The lemma now follows from Lemma 1 applied to GreedyClustering. ∎

Using Lemma 3, we now construct a solution to instance . Fix index , and denote the set of balls obtained by expanding each ball in by an additive . Note that each ball in has radius . For every point , we add a ball of radius around it and let be the resulting set of balls. Note that .

By definition, for each point , there is a ball in that intersects cluster , whose points are at distance at most from . It follows that the balls in cover each point in .

Using Lemma 3, the coverage of in instance is at least

The final inequality follows because any point covered by solution for either belongs to or to . Thus, we have shown that has a feasible solution.

Phase 2.

Now we describe the second phase of the algorithm. We have the instance of Robust -NUC that is output by Phase 1. Phase 2 takes as input and generates an instance , for each , of the Colorful -NUC problem. Note that the number of generated instances is . If is feasible, at least one of these instances will be feasible.

Let be an ordering of the points in by non-increasing . That is, for .

Fix an index . We now describe the instance of colorful -NUC. Let denote the set of red points, and denote the set of blue points. For each , define its blue weight as ; for each , define its blue weight as . Define the blue coverage for instance as . We define the red weight function in a slightly different manner. For each red point , let its red weight ; for each , let red weight . Let denote the red coverage for instance . Note that is supported on and on . Let denote the resulting instance of Colorful -NUC problem. Recall that a solution to this instance is required to cover red weight that adds up to at least , and blue weight that adds up to at least . (In instance , the point sets and , the red and blue weights, and total coverage requirements and all depend on the index . This dependence is not made explicit in the notation, so as to keep it simple.)

We now relate the instance to the instances , for .

Lemma 4.

(a) If the instance is feasible, then there exists an such that instance is feasible.
(b) Let be a generated instance of Colorful -NuKc, and suppose is a solution to this instance such that contains at most balls of radius for , and covers red weight at least and blue weight at least . Then, we can efficiently obtain a solution to the instance that uses at most balls of radius for , and at most balls of radius .

Proof.

We first show part (b). In instance , the red weight for each , so the solution covers at least red points. So the number of red points that are not covered is at most . Construct by adding a ball of radius at each uncovered point in . Thus, .

Let for each . Now, we argue that the solution covers weight at least in instance . Note that this solution covers all points in , and a subset such that . Thus the coverage for is at least

We now turn to part (a). Fix a feasible solution to . Let denote the subset consisting of each point covered by a ball in , for . Let . Each point in is either an outlier or is covered by a ball in . Note that in the sequence , the points of and may appear in an interleaved fashion. Let be the subsequence restricted to the points in . Let , and let . A key observation is that is at least as large as the total weight of the points in covered by balls in . This is because each ball in has radius and can cover only one point in ; and the maximum coverage using such balls is obtained by placing them at the points in with the highest weights, i.e, . Without loss of generality, we assume that consists of balls of radius placed at each point in .

Now, let the index . We now argue that the instance of colorful -NUC is feasible. In particular, we argue that is a solution. Consider the set of red points in . Each point in is either in or in , and is therefore covered by . It follows that covers at least points of . In other words, the red weight in covered by is at least .

Now consider the set of blue points in . Let denote the blue points covered by solution . As covers points with weight at least in instance , we have ; thus, . However, the balls in do not cover any point in . We conclude that the balls in cover all points in . For any , we have . It follows that the blue weight in covered by is at least . This concludes the proof of part (a).

Combining Lemmas 2 and 4 from Phases 1 and 2, we obtain the following reduction from robust -NuKC to colorful -NuKC.

Theorem 2.

There is a polynomial-time algorithm that, given an instance of Robust -NUC, outputs a collection of instances of Colorful -NUC with the following properties: (a) If is feasible, then at least one of the instances of Colorful -NUC is feasible; (b) given an -approximate solution to some instance , we can efficiently construct a solution to that uses at most balls of radius at most .

Remark 1.

In part (a), the feasible solution for that is constructed from the feasible solution for has the following useful property: for any of radius in the feasible solution for , the center of is also the center of some ball of radius in the feasible solution for .

4 Ensuring Self-Coverage in Colorful -NuC

We assume that we are given as input a Colorful -NUC instance . Recall that (resp. ) is the red (resp. blue) weight function. The task in Colorful -NUC is to find a solution such that (1) for , and (2) the point set covered by the solution satisfies and , (i.e., the solution covers points with total red weight at least , and blue weight at least .) In this section, we show that can be reduced to an instance of Colorful -NUC with . The fact that each ball of radius can only cover its center in the target instance is what we mean by the term self-coverage. This reduction actually generalizes to Colorful -NUC, but we address the case to keep the notation simpler.

Our reduction proceeds in two phases. In Phase 1, we construct an intermediate instance where we can ensure blue self-coverage. Then in Phase 2, we modify the intermediate instance so as to obtain red self-coverage as well.

Phase 1.

In this step, we call the greedy clustering algorithm using the blue weight function . In particular, we call GreedyClustering() (See Algorithm 1). This algorithm returns a set of points , where every has a cluster and weight such that (1) is a partition of ; (2) for any , , the blue weight of the cluster, and (3) for any . Furthermore, the greedy algorithm naturally defines an ordering of – this is the order in which the points were added to .

We define a new weight function as follows: if and if . Note that for , we have . So the new weight function is obtained from by moving weight from each cluster to the cluster center .

Phase 1 outputs the intermediate instance of Colorful -NUC, where and . A solution for is said to be structured if it has the following properties.

  1. It is a solution to viewed as an instance of Colorful -NUC.

  2. Let