1 Introduction
median is one of the most fundamental and wellstudied clustering problems in the computer science community. In the standard median problem, we are given a set of clients and as candidate facility locations, a metric on and a parameter . The goal is to select a subset with size , and minimize the sum of distances from every client to its closest facility in . In other words, denote the cost vector induced by where , and the objective is to minimize . median is famously known as APXhard, and admits constantfactor approximations [arya2004local, byrka2015improved, charikar2012dependent, jain1999primal, Li:2013], just to name a few. In the more general ordered
median problem, we are additionally given a nonincreasing nonnegative vector
, and the goal is to minimize , where is the sorted version of in nonincreasing order. This problem generalizes center, median and facility centrum, among many others, and several constantfactor approximations are developed recently [aouad2019ordered, byrka2018constant, chakrabarty2018interpolating, chakrabarty2019approximation].In this paper, we study two natural generalizations of ordered median. The first one is FaultTolerant Ordered Median, which, in addition to standard ordered median, associate each client with a value and we need to connect to distinct open facilities in . The cost vector is redefined as where , i.e., the sum of distances from to its closest open facilities. With the study of 5G networks, softwaredefined networks and sensor networks, the motivation for multiconnection/coverage problems is becoming stronger, therefore in light of the corresponding faulttolerant concept, we formally define the problem as follows.
Definition 1.
Given a metric space , an instance of FaultTolerant Ordered Median is specified by , corresponding , and a nonincreasing nonnegative vector . We are required to open facility locations . Let be the service cost of . The goal is to minimize the ordered mediantype objective
(1) 
For the second result, we consider Robust Ordered Median. As a generalization for ordered median, the input contains a parameter and the weight vector now has dimension . The goal is to choose a subset of facilities with size at most and clients , so that if is the induced cost vector of assigning one facility for each , the ordered cost is minimized. We give the formal definition below.
Definition 2.
Given a metric space , an instance of Robust Ordered Median is specified by , and a nonincreasing nonnegative vector . We are required to open facility locations and serve clients , such that the following ordered objective is minimized,
(2) 
1.1 Our Results
Our first result is a constantfactor approximation algorithm for FaultTolerant Ordered Median. At the core of our algorithm, we use an explicitly sparsified LP objective to model the highly nonlinear ordered objective, solve it using the ellipsoid algorithm, and use an auxiliary LP [hajiaghayi2016constant] for the stochastic rounding. Compared to previous attempts of LP sparsification [byrka2018constant, chakrabarty2018interpolating], our LP relaxation for ordered costs contains an additional set of constraints that regularizes the solution’s “distribution” on different lengths of possible connections (see the first line of constraints in ), and forces the solution to favour shorter connections. The additional constraints are inspired by [chakrabarty2019approximation]. Our analysis of the stochastic cost of each client is quite involved, and reveals a strengthened version of Lemma 4.5 in [byrka2018constant]. The method to sparsify the instance is from [chakrabarty2019approximation] and based on the indices of entries in the weight vector .
In the second part, we present a constantfactor approximation algorithm for Robust Ordered Median. Our algorithm follows the sparsification and iterative rounding procedures of [krishnaswamy2018constant], but exploits a unique observation in the LP relaxation (see the proof for Theorem 2), where we use a pseudo cost function and a parameter to scale the objective value. Using this observation, although the objective is nonlinearly scaled, we are still able to obtain an affine relation between its solutions and the optimum. Our extension is also used to give constant approximations for ordered matroid median and ordered knapsack median, for which the proofs can be found in the appendix. We note that the sparsification technique in this part [aouad2019ordered, byrka2018constant] is different from the first part, and is used to construct the pseudo cost function based on the input value.
1.2 Other Related Work
The notion of faulttolerance has been studied in many problems, including variants from facility location [byrka2010fault, guha2003constant, swamy2008fault], center [chaudhuri1998p, khuller2000fault] and median [hajiaghayi2016constant, inamdar2019fault]
. Clustering problems with outliers, a.k.a. robust clustering problems, have also been studied extensively in many facility location and
clustering problems, see, e.g., [chakrabarty2016non, charikar2001algorithms, chen2016matroid, harris2017lottery, krishnaswamy2018constant]. As a result relevant to ours, Hajiaghayi et al. [hajiaghayi2016constant] present the first constantfactor approximation for faulttolerant median with nonuniform s. Another closely related result is the iterative rounding framework by Krishnaswamy et al. [krishnaswamy2018constant] for solving robust median.2 FaultTolerant Ordered Median
2.1 Instance Sparsification and Surrogate LP
In the preprocessing step, we use the sparsification methods in [chakrabarty2019approximation]. Let and define . Clearly, and . For , denote the smallest element in that is larger than . Let and . We abbreviate to when and are clear from the context.
Given and , define for . For , if such that , define , and . To use for later approximations, we have the following simple lemma.
Lemma 1.
([chakrabarty2019approximation]) For nonnegative , .
Next, we consider the optimal solution and its individual connection (service) distances in nonincreasing order. For every , define as the th largest connection distance between clients and facilities. Since takes a polynomial number of possible values, we assume that we guess it correctly. For , let . We want to guess a nonincreasing sequence with entries from , so that if and 0 otherwise. Notice that and is nonincreasing with length , using a basic counting method, the number of all possible nonincreasing sequences is at most .
Further assume we have guessed the correct sequence with the desired properties, we then define for any , in which way we guarantee regardless.
We are ready to formulate our LP relaxation. Let be the extent we open facility location and the extent of connecting client to facility . Given , define if and 0 otherwise. With the guessed values and sparsified weights , our LP relaxation is given as follows,
min  ()  
s.t.  
Although there are exponentially many constraints, it is easy to construct a polynomialtime separation oracle. Indeed, for any solution , to verify the second set of constraints, we only need to define and check whether the top values in sum up to a value larger than . Therefore, the LP relaxation can be efficiently solved using the ellipsoid method. We prove the following lemma in the appendix.
2.2 Constructing Bundles and the Laminar Family
We use the rounding procedure in [hajiaghayi2016constant]. Fix an optimal fractional solution to , we create a family of nonintersecting sets of facility locations called bundles via the split of facilities (see, e.g., [guha2003constant, hajiaghayi2016constant]).
Let and denote the volume of , so . There exists a partition of such that contains the th closest unit volume to in . Define and . Also define similar measures for the truncated distance function . For the sake of being selfcontained, we provide the algorithm [yan2015lp] as follows.
We call a client dangerous if , and denote the set of dangerous clients . We call two dangerous clients in conflict if and . This definition of conflict is somewhat different from [hajiaghayi2016constant], since we need to find a subset of dangerous clients that are far enough apart from each other. Using the filtering process in [hajiaghayi2016constant], we create such that no two clients in are in conflict.
We also define a closed ball for each dangerous client , , where we abbreviate to . It is easy to see that and the set is fully contained in . Now we proceed to present a simple algorithm [hajiaghayi2016constant] that constructs the laminar family .
Lemma 3.
([hajiaghayi2016constant]) is laminar.
2.3 Rounding the LP Solution
Recall that in the last section, we split some facilities into colocated copies. We abuse the notation and denote for the set after split, and the facility location set before the split. We also define by taking a copy to its original location in . The following auxiliary LP [hajiaghayi2016constant] is defined by two laminar families: and , thus it is integral.
Since is a feasible solution, it is the convex combination of a polynomial number of integral solutions by Carathéodory’s Theorem. We can efficiently and randomly sample an integral solution such that . It means that

[label=()]

For any , ;
2.4 Analysis
From the rounded integral solution , define as the open facility locations. Notice that is stochastic, so the service cost is also stochastic. We have the following core lemma on the nice property of , and consequently, the expected cost of facility centrum for every . See the appendix for their proofs.
Lemma 4.
Corollary 1.
For any , denote the sum of the largest service costs, then
Consider the weight vector . For any , the weight values are identical between and , so the ordered cost with respect to can be rewritten as another weighted sum of facility centrum costs, (see, e.g., [byrka2018constant]). Then using Lemma 1, the cost of w.r.t. is also bounded, and we provide the following main theorem.
Theorem 1.
(Main Theorem) Our algorithm is a approximation for FaultTolerant Ordered Median that runs in time .
Proof.
Using Lemma 4, with the output solution , the expected ordered cost with respect to is
Since we have , and is the th longest connection in the optimal solution and obviously , we further obtain
Finally, the service cost vector satisfies
and the overall approximation factor . The running time is easily obtained from the guessing procedures. ∎
3 Robust Ordered Median
3.1 Instance Sparsification and Surrogate LP
We first use the sparsification techniques in [aouad2019ordered, byrka2018constant, chakrabarty2018interpolating] to simplify the instance. We also assume for any that are not colocated.
Let be the corresponding cost vector in the optimal solution . We first guess the exact value of , i.e., the largest cost value, since it only has a polynomial number of possibilities. Next, we discretize by integer powers of for some fixed . Let be the smallest integer such that . We define intervals where , for and . Since and they are disjoint, every falls into exactly one interval. In order to avoid complications caused by weights that are zeros or too small, we define the replacement weight vector as . We have the following simple lemma, by observing and the difference .
Lemma 5.
For any , .
We sort into , align it with and consider the entries of that fall into every interval . Define the average weight for optimal costs and the th interval as
and we guess a nonincreasing sequence of estimated weights
such that for every , , and is some integer power of . Since the entries of are at least and at most , the number possible values is . By definition of , we also have , so using routine calculations, the number of possible nonincreasing sequences is at most . Now we define the piecewise linear pseudo cost function ,(3) 
We write the LP relaxation for Robust Ordered Median for some parameter , which will be determined later.
()  
s.t.  
Although the objective of is nonlinearly scaleddown because of the discontinuity of , we show that there is actually a delicate connection between its objective value and the optimum for the original problem. To this end, we have the following crucial lemma, and a strong theorem for their relations. The proofs can be found in the appendix.
Lemma 6.
For any , a set of size and , one has .
3.2 Preprocessing and Stronger LP
In the section, we proceed with the reduced instance with no weight vectors, a parameter and a pseudo cost function . The weights are implicit in the definition of , and we try to minimize instead.
We adopt techniques used in [krishnaswamy2018constant] and add constraints with respect to the socalled star costs and backup costs. We first guess , where is the minimum objective value of integral solutions to when , and it is not hard to see that we can guess as an integer power of , and the number of possible values is logarithmic. Let be the optimal solution and we define for every . The following theorems are direct applications of [krishnaswamy2018constant]. We present a proof sketch of Theorem 4 in the appendix. Denote the closed ball over a certain subset as .
Theorem 3.
([krishnaswamy2018constant]) Given and the upper bound , there exists an time algorithm that finds an extended instance that satisfies

[label=()]

For every , denote , we have ,

For every , we have ,

Theorem 4.
Given the instance found in Theorem 3, we can efficiently compute connection bounds in time, such that,

(Validity of ) Let be the cost of for . There exists a solution for , such that if is connected to , then for every and
(4) (Small star cost) Moreover, let be the facility connects to in this solution, then for every , we have
(5) 
(Small backup cost) For every , we have
(6)
If we define , then Property 2 is equivalent to
(7) 
For , we define the following extended LP and provide two lemmas on its objective value and rounding results. The proofs are in the appendix.
3.3 Iterative Rounding
In this section, we proceed to find a solution for that is almost integral, and we start with a solution that is found using Lemma 8. We use discretization methods in [krishnaswamy2018constant]: define and for where and , and let .
From the solution , for every we define outer ball and radius level such that . We further define inner ball .
We maintain three sets of clients, , and , such that we always have . Every client in is to be assigned an open facility that is relative close to it, and is ultimately used to place these facilities around. Initially, we set , and (here we view every as a virtual client). For , define the auxiliary LP as
s.t.  (13)  
(14)  
Comments
There are no comments yet.