-median is one of the most fundamental and well-studied clustering problems in the computer science community. In the standard -median problem, we are given a set of clients and as candidate facility locations, a metric on and a parameter . The goal is to select a subset with size , and minimize the sum of distances from every client to its closest facility in . In other words, denote the cost vector induced by where , and the objective is to minimize . -median is famously known as APX-hard, and admits constant-factor approximations [arya2004local, byrka2015improved, charikar2012dependent, jain1999primal, Li:2013], just to name a few. In the more general ordered
-median problem, we are additionally given a non-increasing non-negative vector, and the goal is to minimize , where is the sorted version of in non-increasing order. This problem generalizes -center, -median and -facility -centrum, among many others, and several constant-factor approximations are developed recently [aouad2019ordered, byrka2018constant, chakrabarty2018interpolating, chakrabarty2019approximation].
In this paper, we study two natural generalizations of ordered -median. The first one is Fault-Tolerant Ordered -Median, which, in addition to standard ordered -median, associate each client with a value and we need to connect to distinct open facilities in . The cost vector is redefined as where , i.e., the sum of distances from to its closest open facilities. With the study of 5G networks, software-defined networks and sensor networks, the motivation for multi-connection/coverage problems is becoming stronger, therefore in light of the corresponding fault-tolerant concept, we formally define the problem as follows.
Given a metric space , an instance of Fault-Tolerant Ordered -Median is specified by , corresponding , and a non-increasing non-negative vector . We are required to open facility locations . Let be the service cost of . The goal is to minimize the ordered median-type objective
For the second result, we consider Robust Ordered -Median. As a generalization for ordered -median, the input contains a parameter and the weight vector now has dimension . The goal is to choose a subset of facilities with size at most and clients , so that if is the induced cost vector of assigning one facility for each , the ordered cost is minimized. We give the formal definition below.
Given a metric space , an instance of Robust Ordered -Median is specified by , and a non-increasing non-negative vector . We are required to open facility locations and serve clients , such that the following ordered objective is minimized,
1.1 Our Results
Our first result is a constant-factor approximation algorithm for Fault-Tolerant Ordered -Median. At the core of our algorithm, we use an explicitly sparsified LP objective to model the highly non-linear ordered objective, solve it using the ellipsoid algorithm, and use an auxiliary LP [hajiaghayi2016constant] for the stochastic rounding. Compared to previous attempts of LP sparsification [byrka2018constant, chakrabarty2018interpolating], our LP relaxation for ordered costs contains an additional set of constraints that regularizes the solution’s “distribution” on different lengths of possible connections (see the first line of constraints in ), and forces the solution to favour shorter connections. The additional constraints are inspired by [chakrabarty2019approximation]. Our analysis of the stochastic cost of each client is quite involved, and reveals a strengthened version of Lemma 4.5 in [byrka2018constant]. The method to sparsify the instance is from [chakrabarty2019approximation] and based on the indices of entries in the weight vector .
In the second part, we present a constant-factor approximation algorithm for Robust Ordered -Median. Our algorithm follows the sparsification and iterative rounding procedures of [krishnaswamy2018constant], but exploits a unique observation in the LP relaxation (see the proof for Theorem 2), where we use a pseudo cost function and a parameter to scale the objective value. Using this observation, although the objective is non-linearly scaled, we are still able to obtain an affine relation between its solutions and the optimum. Our extension is also used to give constant approximations for ordered matroid median and ordered knapsack median, for which the proofs can be found in the appendix. We note that the sparsification technique in this part [aouad2019ordered, byrka2018constant] is different from the first part, and is used to construct the pseudo cost function based on the input value.
1.2 Other Related Work
The notion of fault-tolerance has been studied in many problems, including variants from facility location [byrka2010fault, guha2003constant, swamy2008fault], -center [chaudhuri1998p, khuller2000fault] and -median [hajiaghayi2016constant, inamdar2019fault]
. Clustering problems with outliers, a.k.a. robust clustering problems, have also been studied extensively in many facility location and-clustering problems, see, e.g., [chakrabarty2016non, charikar2001algorithms, chen2016matroid, harris2017lottery, krishnaswamy2018constant]. As a result relevant to ours, Hajiaghayi et al. [hajiaghayi2016constant] present the first constant-factor approximation for fault-tolerant -median with non-uniform s. Another closely related result is the iterative rounding framework by Krishnaswamy et al. [krishnaswamy2018constant] for solving robust -median.
2 Fault-Tolerant Ordered -Median
2.1 Instance Sparsification and Surrogate LP
In the pre-processing step, we use the sparsification methods in [chakrabarty2019approximation]. Let and define . Clearly, and . For , denote the smallest element in that is larger than . Let and . We abbreviate to when and are clear from the context.
Given and , define for . For , if such that , define , and . To use for later approximations, we have the following simple lemma.
([chakrabarty2019approximation]) For non-negative , .
Next, we consider the optimal solution and its individual connection (service) distances in non-increasing order. For every , define as the -th largest connection distance between clients and facilities. Since takes a polynomial number of possible values, we assume that we guess it correctly. For , let . We want to guess a non-increasing sequence with entries from , so that if and 0 otherwise. Notice that and is non-increasing with length , using a basic counting method, the number of all possible non-increasing sequences is at most .
Further assume we have guessed the correct sequence with the desired properties, we then define for any , in which way we guarantee regardless.
We are ready to formulate our LP relaxation. Let be the extent we open facility location and the extent of connecting client to facility . Given , define if and 0 otherwise. With the guessed values and sparsified weights , our LP relaxation is given as follows,
Although there are exponentially many constraints, it is easy to construct a polynomial-time separation oracle. Indeed, for any solution , to verify the second set of constraints, we only need to define and check whether the top values in sum up to a value larger than . Therefore, the LP relaxation can be efficiently solved using the ellipsoid method. We prove the following lemma in the appendix.
2.2 Constructing Bundles and the Laminar Family
We use the rounding procedure in [hajiaghayi2016constant]. Fix an optimal fractional solution to , we create a family of non-intersecting sets of facility locations called bundles via the split of facilities (see, e.g., [guha2003constant, hajiaghayi2016constant]).
Let and denote the volume of , so . There exists a partition of such that contains the -th closest unit volume to in . Define and . Also define similar measures for the truncated distance function . For the sake of being self-contained, we provide the algorithm [yan2015lp] as follows.
We call a client dangerous if , and denote the set of dangerous clients . We call two dangerous clients in conflict if and . This definition of conflict is somewhat different from [hajiaghayi2016constant], since we need to find a subset of dangerous clients that are far enough apart from each other. Using the filtering process in [hajiaghayi2016constant], we create such that no two clients in are in conflict.
We also define a closed ball for each dangerous client , , where we abbreviate to . It is easy to see that and the set is fully contained in . Now we proceed to present a simple algorithm [hajiaghayi2016constant] that constructs the laminar family .
([hajiaghayi2016constant]) is laminar.
2.3 Rounding the LP Solution
Recall that in the last section, we split some facilities into co-located copies. We abuse the notation and denote for the set after split, and the facility location set before the split. We also define by taking a copy to its original location in . The following auxiliary LP [hajiaghayi2016constant] is defined by two laminar families: and , thus it is integral.
Since is a feasible solution, it is the convex combination of a polynomial number of integral solutions by Carathéodory’s Theorem. We can efficiently and randomly sample an integral solution such that . It means that
For any , ;
, the probability of its opening is.
From the rounded integral solution , define as the open facility locations. Notice that is stochastic, so the service cost is also stochastic. We have the following core lemma on the nice property of , and consequently, the expected cost of -facility -centrum for every . See the appendix for their proofs.
Fix any , there exists a random variable
, there exists a random variablesuch that
holds with probability 1, and
For any , denote the sum of the largest service costs, then
Consider the weight vector . For any , the weight values are identical between and , so the ordered cost with respect to can be rewritten as another weighted sum of -facility -centrum costs, (see, e.g., [byrka2018constant]). Then using Lemma 1, the cost of w.r.t. is also bounded, and we provide the following main theorem.
(Main Theorem) Our algorithm is a -approximation for Fault-Tolerant Ordered -Median that runs in time .
Using Lemma 4, with the output solution , the expected ordered cost with respect to is
Since we have , and is the -th longest connection in the optimal solution and obviously , we further obtain
Finally, the service cost vector satisfies
and the overall approximation factor . The running time is easily obtained from the guessing procedures. ∎
3 Robust Ordered -Median
3.1 Instance Sparsification and Surrogate LP
We first use the sparsification techniques in [aouad2019ordered, byrka2018constant, chakrabarty2018interpolating] to simplify the instance. We also assume for any that are not co-located.
Let be the corresponding cost vector in the optimal solution . We first guess the exact value of , i.e., the largest cost value, since it only has a polynomial number of possibilities. Next, we discretize by integer powers of for some fixed . Let be the smallest integer such that . We define intervals where , for and . Since and they are disjoint, every falls into exactly one interval. In order to avoid complications caused by weights that are zeros or too small, we define the replacement weight vector as . We have the following simple lemma, by observing and the difference .
For any , .
We sort into , align it with and consider the entries of that fall into every interval . Define the average weight for optimal costs and the -th interval as
and we guess a non-increasing sequence of estimated weightssuch that for every , , and is some integer power of . Since the entries of are at least and at most , the number possible values is . By definition of , we also have , so using routine calculations, the number of possible non-increasing sequences is at most . Now we define the piece-wise linear pseudo cost function ,
We write the LP relaxation for Robust Ordered -Median for some parameter , which will be determined later.
Although the objective of is non-linearly scaled-down because of the discontinuity of , we show that there is actually a delicate connection between its objective value and the optimum for the original problem. To this end, we have the following crucial lemma, and a strong theorem for their relations. The proofs can be found in the appendix.
For any , a set of size and , one has .
3.2 Pre-processing and Stronger LP
In the section, we proceed with the reduced instance with no weight vectors, a parameter and a pseudo cost function . The weights are implicit in the definition of , and we try to minimize instead.
We adopt techniques used in [krishnaswamy2018constant] and add constraints with respect to the so-called star costs and backup costs. We first guess , where is the minimum objective value of integral solutions to when , and it is not hard to see that we can guess as an integer power of , and the number of possible values is logarithmic. Let be the optimal solution and we define for every . The following theorems are direct applications of [krishnaswamy2018constant]. We present a proof sketch of Theorem 4 in the appendix. Denote the closed ball over a certain subset as .
([krishnaswamy2018constant]) Given and the upper bound , there exists an -time algorithm that finds an extended instance that satisfies
For every , denote , we have ,
For every , we have ,
Given the instance found in Theorem 3, we can efficiently compute connection bounds in time, such that,
(Validity of ) Let be the cost of for . There exists a solution for , such that if is connected to , then for every and
(Small star cost) Moreover, let be the facility connects to in this solution, then for every , we have
(Small backup cost) For every , we have
If we define , then Property 2 is equivalent to
For , we define the following extended LP and provide two lemmas on its objective value and rounding results. The proofs are in the appendix.
3.3 Iterative Rounding
In this section, we proceed to find a solution for that is almost integral, and we start with a solution that is found using Lemma 8. We use discretization methods in [krishnaswamy2018constant]: define and for where and , and let .
From the solution , for every we define outer ball and radius level such that . We further define inner ball .
We maintain three sets of clients, , and , such that we always have . Every client in is to be assigned an open facility that is relative close to it, and is ultimately used to place these facilities around. Initially, we set , and (here we view every as a virtual client). For , define the auxiliary LP as