Ordered k-Median with Fault-Tolerance and Robustness

11/09/2020
by   Shichuan Deng, et al.
Tsinghua University
0

We study fault-tolerant ordered k-median and robust ordered k-median, both as generalizations of the ordered k-median problem. In these problems, we are often given a metric space, and asked to open a set of at most k facilities and provide a certain assignment of these facilities to a set of clients, in order to minimize the ordered weighted sum of the induced service costs from all facility-client assignments. In the fault-tolerant problem, every client j has a requirement r_j and needs to be assigned r_j distinct facilities. The cost of client j is the sum of distances to its assigned facilities. In the robust problem, a parameter m is given and we need to assign some facility to at least m clients. We give polynomial-time constant-factor approximation algorithms for both problems, which use standard sparsification as well as iterative rounding techniques for LP relaxations. We also consider ordered knapsack median and ordered matroid median, and use the iterative rounding framework to obtain constant-factor approximations as well.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

05/10/2022

Constant approximation for fault-tolerant median problems via iterative rounding

In this paper, we study the fault-tolerant matroid median and fault-tole...
11/18/2021

On Clustering with Discounts

We study the k-median with discounts problem, wherein we are given clien...
11/23/2017

Interpolating between k-Median and k-Center: Approximation Algorithms for Ordered k-Median

We consider a generalization of k-median and k-center, called the order...
11/16/2020

To Close Is Easier Than To Open: Dual Parameterization To k-Median

The k-Median problem is one of the well-known optimization problems that...
11/06/2017

Constant-Factor Approximation for Ordered k-Median

We study the Ordered k-Median problem, in which the solution is evaluate...
11/02/2020

Fault-Tolerant Center Problems with Robustness and Fairness

We study a family of clustering problems that require fault-tolerant sol...
07/06/2019

Constant-Factor Approximation Algorithms for Parity-Constrained Facility Location Problems

Facility location is a prominent optimization problem that has inspired ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

-median is one of the most fundamental and well-studied clustering problems in the computer science community. In the standard -median problem, we are given a set of clients and as candidate facility locations, a metric on and a parameter . The goal is to select a subset with size , and minimize the sum of distances from every client to its closest facility in . In other words, denote the cost vector induced by where , and the objective is to minimize . -median is famously known as APX-hard, and admits constant-factor approximations [arya2004local, byrka2015improved, charikar2012dependent, jain1999primal, Li:2013], just to name a few. In the more general ordered

-median problem, we are additionally given a non-increasing non-negative vector

, and the goal is to minimize , where is the sorted version of in non-increasing order. This problem generalizes -center, -median and -facility -centrum, among many others, and several constant-factor approximations are developed recently [aouad2019ordered, byrka2018constant, chakrabarty2018interpolating, chakrabarty2019approximation].

In this paper, we study two natural generalizations of ordered -median. The first one is Fault-Tolerant Ordered -Median, which, in addition to standard ordered -median, associate each client with a value and we need to connect to distinct open facilities in . The cost vector is redefined as where , i.e., the sum of distances from to its closest open facilities. With the study of 5G networks, software-defined networks and sensor networks, the motivation for multi-connection/coverage problems is becoming stronger, therefore in light of the corresponding fault-tolerant concept, we formally define the problem as follows.

Definition 1.

Given a metric space , an instance of Fault-Tolerant Ordered -Median is specified by , corresponding , and a non-increasing non-negative vector . We are required to open facility locations . Let be the service cost of . The goal is to minimize the ordered median-type objective

(1)

For the second result, we consider Robust Ordered -Median. As a generalization for ordered -median, the input contains a parameter and the weight vector now has dimension . The goal is to choose a subset of facilities with size at most and clients , so that if is the induced cost vector of assigning one facility for each , the ordered cost is minimized. We give the formal definition below.

Definition 2.

Given a metric space , an instance of Robust Ordered -Median is specified by , and a non-increasing non-negative vector . We are required to open facility locations and serve clients , such that the following ordered objective is minimized,

(2)

1.1 Our Results

Our first result is a constant-factor approximation algorithm for Fault-Tolerant Ordered -Median. At the core of our algorithm, we use an explicitly sparsified LP objective to model the highly non-linear ordered objective, solve it using the ellipsoid algorithm, and use an auxiliary LP [hajiaghayi2016constant] for the stochastic rounding. Compared to previous attempts of LP sparsification [byrka2018constant, chakrabarty2018interpolating], our LP relaxation for ordered costs contains an additional set of constraints that regularizes the solution’s “distribution” on different lengths of possible connections (see the first line of constraints in ), and forces the solution to favour shorter connections. The additional constraints are inspired by [chakrabarty2019approximation]. Our analysis of the stochastic cost of each client is quite involved, and reveals a strengthened version of Lemma 4.5 in [byrka2018constant]. The method to sparsify the instance is from [chakrabarty2019approximation] and based on the indices of entries in the weight vector .

In the second part, we present a constant-factor approximation algorithm for Robust Ordered -Median. Our algorithm follows the sparsification and iterative rounding procedures of [krishnaswamy2018constant], but exploits a unique observation in the LP relaxation (see the proof for Theorem 2), where we use a pseudo cost function and a parameter to scale the objective value. Using this observation, although the objective is non-linearly scaled, we are still able to obtain an affine relation between its solutions and the optimum. Our extension is also used to give constant approximations for ordered matroid median and ordered knapsack median, for which the proofs can be found in the appendix. We note that the sparsification technique in this part [aouad2019ordered, byrka2018constant] is different from the first part, and is used to construct the pseudo cost function based on the input value.

1.2 Other Related Work

The notion of fault-tolerance has been studied in many problems, including variants from facility location [byrka2010fault, guha2003constant, swamy2008fault], -center [chaudhuri1998p, khuller2000fault] and -median [hajiaghayi2016constant, inamdar2019fault]

. Clustering problems with outliers, a.k.a. robust clustering problems, have also been studied extensively in many facility location and

-clustering problems, see, e.g., [chakrabarty2016non, charikar2001algorithms, chen2016matroid, harris2017lottery, krishnaswamy2018constant]. As a result relevant to ours, Hajiaghayi et al. [hajiaghayi2016constant] present the first constant-factor approximation for fault-tolerant -median with non-uniform s. Another closely related result is the iterative rounding framework by Krishnaswamy et al. [krishnaswamy2018constant] for solving robust -median.

2 Fault-Tolerant Ordered -Median

2.1 Instance Sparsification and Surrogate LP

In the pre-processing step, we use the sparsification methods in [chakrabarty2019approximation]. Let and define . Clearly, and . For , denote the smallest element in that is larger than . Let and . We abbreviate to when and are clear from the context.

Given and , define for . For , if such that , define , and . To use for later approximations, we have the following simple lemma.

Lemma 1.

([chakrabarty2019approximation]) For non-negative , .

Next, we consider the optimal solution and its individual connection (service) distances in non-increasing order. For every , define as the -th largest connection distance between clients and facilities. Since takes a polynomial number of possible values, we assume that we guess it correctly. For , let . We want to guess a non-increasing sequence with entries from , so that if and 0 otherwise. Notice that and is non-increasing with length , using a basic counting method, the number of all possible non-increasing sequences is at most .

Further assume we have guessed the correct sequence with the desired properties, we then define for any , in which way we guarantee regardless.

We are ready to formulate our LP relaxation. Let be the extent we open facility location and the extent of connecting client to facility . Given , define if and 0 otherwise. With the guessed values and sparsified weights , our LP relaxation is given as follows,

min ()
s.t.

Although there are exponentially many constraints, it is easy to construct a polynomial-time separation oracle. Indeed, for any solution , to verify the second set of constraints, we only need to define and check whether the top values in sum up to a value larger than . Therefore, the LP relaxation can be efficiently solved using the ellipsoid method. We prove the following lemma in the appendix.

Lemma 2.

Denote the optimum with respect to the original weight vector . The relaxation  has an optimal value at most .

2.2 Constructing Bundles and the Laminar Family

We use the rounding procedure in [hajiaghayi2016constant]. Fix an optimal fractional solution to , we create a family of non-intersecting sets of facility locations called bundles via the split of facilities (see, e.g., [guha2003constant, hajiaghayi2016constant]).

Let and denote the volume of , so . There exists a partition of such that contains the -th closest unit volume to in . Define and . Also define similar measures for the truncated distance function . For the sake of being self-contained, we provide the algorithm [yan2015lp] as follows.

1:  Initialize queues and . Initialize
2:  while there exists such that  do
3:     Choose , s.t. if is the closest unit volume in , is minimized
4:     if there exists and  then
5:        add to the end of , and remove from
6:     else
7:        add to the end of , and add to . Remove from
8:     end if
9:  end while
10:  return  
Algorithm 1

We call a client dangerous if , and denote the set of dangerous clients . We call two dangerous clients in conflict if and . This definition of conflict is somewhat different from [hajiaghayi2016constant], since we need to find a subset of dangerous clients that are far enough apart from each other. Using the filtering process in [hajiaghayi2016constant], we create such that no two clients in are in conflict.

We also define a closed ball for each dangerous client , , where we abbreviate to . It is easy to see that and the set is fully contained in . Now we proceed to present a simple algorithm [hajiaghayi2016constant] that constructs the laminar family .

Lemma 3.

([hajiaghayi2016constant]) is laminar.

1:  for  in non-decreasing order of  do
2:     Let be the set of clients that satisfies and
3:     
4:  end for
5:  return  
Algorithm 2

2.3 Rounding the LP Solution

Recall that in the last section, we split some facilities into co-located copies. We abuse the notation and denote for the set after split, and the facility location set before the split. We also define by taking a copy to its original location in . The following auxiliary LP [hajiaghayi2016constant] is defined by two laminar families: and , thus it is integral.

Since is a feasible solution, it is the convex combination of a polynomial number of integral solutions by Carathéodory’s Theorem. We can efficiently and randomly sample an integral solution such that . It means that

  1. [label=()]

  2. For any , ;

  3. For any

    , the probability of its opening is

    .

2.4 Analysis

From the rounded integral solution , define as the open facility locations. Notice that is stochastic, so the service cost is also stochastic. We have the following core lemma on the nice property of , and consequently, the expected cost of -facility -centrum for every . See the appendix for their proofs.

Lemma 4.

Fix any

, there exists a random variable

such that

holds with probability 1, and

Corollary 1.

For any , denote the sum of the largest service costs, then

Consider the weight vector . For any , the weight values are identical between and , so the ordered cost with respect to can be rewritten as another weighted sum of -facility -centrum costs, (see, e.g., [byrka2018constant]). Then using Lemma 1, the cost of w.r.t. is also bounded, and we provide the following main theorem.

Theorem 1.

(Main Theorem) Our algorithm is a -approximation for Fault-Tolerant Ordered -Median that runs in time .

Proof.

Using Lemma 4, with the output solution , the expected ordered cost with respect to is

Since we have , and is the -th longest connection in the optimal solution and obviously , we further obtain

Finally, the service cost vector satisfies

and the overall approximation factor . The running time is easily obtained from the guessing procedures. ∎

3 Robust Ordered -Median

3.1 Instance Sparsification and Surrogate LP

We first use the sparsification techniques in [aouad2019ordered, byrka2018constant, chakrabarty2018interpolating] to simplify the instance. We also assume for any that are not co-located.

Let be the corresponding cost vector in the optimal solution . We first guess the exact value of , i.e., the largest cost value, since it only has a polynomial number of possibilities. Next, we discretize by integer powers of for some fixed . Let be the smallest integer such that . We define intervals where , for and . Since and they are disjoint, every falls into exactly one interval. In order to avoid complications caused by weights that are zeros or too small, we define the replacement weight vector as . We have the following simple lemma, by observing and the difference .

Lemma 5.

For any , .

We sort into , align it with and consider the entries of that fall into every interval . Define the average weight for optimal costs and the -th interval as

and we guess a non-increasing sequence of estimated weights

such that for every , , and is some integer power of . Since the entries of are at least and at most , the number possible values is . By definition of , we also have , so using routine calculations, the number of possible non-increasing sequences is at most . Now we define the piece-wise linear pseudo cost function ,

(3)

We write the LP relaxation for Robust Ordered -Median for some parameter , which will be determined later.

()
s.t.

Although the objective of  is non-linearly scaled-down because of the discontinuity of , we show that there is actually a delicate connection between its objective value and the optimum for the original problem. To this end, we have the following crucial lemma, and a strong theorem for their relations. The proofs can be found in the appendix.

Lemma 6.

For any , a set of size and , one has .

Theorem 2.

For any integral solution of with objective value , the induced solution is a feasible solution to Robust Ordered -Median with cost at most . Furthermore, the minimal objective value of integral solutions of  is at most .

3.2 Pre-processing and Stronger LP

In the section, we proceed with the reduced instance with no weight vectors, a parameter and a pseudo cost function . The weights are implicit in the definition of , and we try to minimize instead.

We adopt techniques used in [krishnaswamy2018constant] and add constraints with respect to the so-called star costs and backup costs. We first guess , where is the minimum objective value of integral solutions to  when , and it is not hard to see that we can guess as an integer power of , and the number of possible values is logarithmic. Let be the optimal solution and we define for every . The following theorems are direct applications of [krishnaswamy2018constant]. We present a proof sketch of Theorem 4 in the appendix. Denote the closed ball over a certain subset as .

Theorem 3.

([krishnaswamy2018constant]) Given and the upper bound , there exists an -time algorithm that finds an extended instance that satisfies

  1. [label=()]

  2. For every , denote , we have ,

  3. For every , we have ,

Theorem 4.

Given the instance found in Theorem 3, we can efficiently compute connection bounds in time, such that,

  1. (Validity of ) Let be the cost of for . There exists a solution for , such that if is connected to , then for every and

    (4)

    (Small star cost) Moreover, let be the facility connects to in this solution, then for every , we have

    (5)
  2. (Small backup cost) For every , we have

    (6)

If we define , then Property 2 is equivalent to

(7)

For , we define the following extended LP and provide two lemmas on its objective value and rounding results. The proofs are in the appendix.

()
s.t. constraints in (8)
(9)
(10)
(11)
(12)
Lemma 7.

The optimal objective value of  is at most .

Lemma 8.

([krishnaswamy2018constant]) We can efficiently find a solution such that either or via making co-located copies of facilities, it satisfies the constraints (8), (9), (10), (11) and has objective value of  at most . Moreover, for every not co-located with , its star cost is at most

(13)

3.3 Iterative Rounding

In this section, we proceed to find a solution for  that is almost integral, and we start with a solution that is found using Lemma 8. We use discretization methods in [krishnaswamy2018constant]: define and for where and , and let .

From the solution , for every we define outer ball and radius level such that . We further define inner ball .

We maintain three sets of clients, , and , such that we always have . Every client in is to be assigned an open facility that is relative close to it, and is ultimately used to place these facilities around. Initially, we set , and (here we view every as a virtual client). For , define the auxiliary LP as

s.t. (13)
(14)