 # Improved Local Search Based Approximation Algorithm for Hard Uniform Capacitated k-Median Problem

In this paper, we study the hard uniform capacitated k- median problem using local search heuristic. Obtaining a constant factor approximation for the problem is open. All the existing solutions giving constant-factor approximation, violate at least one of the cardinality and the capacity constraints. All except Koruplou et al are based on LP-relaxation. We give (3+ϵ) factor approximation algorithm for the problem violating the cardinality by a factor of 8/3 ≈ 2.67. There is a trade-off between the approximation factor and the cardinality violation between our work and the existing work. Koruplou et al gave (1 + α) approximation factor with (5 + 5/α) factor loss in cardinality using local search paradigm. Though the approximation factor can be made arbitrarily small, cardinality loss is at least 5. On the other hand, we improve upon the results in [capkmGijswijtL2013],[capkmshili2014], [Lisoda2016] in terms of factor-loss though the cardinality loss is more in our case. Also, these results are obtained using LP-rounding, some of them being strengthened, whereas local search techniques are simple to apply and have been shown to perform well in practice via empirical studies. We extend the result to hard uniform capacitated k-median with penalties. To the best of our knowledge, ours is the first result for the problem.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

- median problem is one of the extensively studied problem in literature [1, 3, 4, 5, 9, 11, 12, 20, 21, 27]. The problem is known to be NP-hard. The input instance consists of a set of facilities, a set of clients, a non-negative integer and a non-negative cost function defining cost to connect clients to the facilities. Metric version of the problem assumes that is symmetric and satisfies triangle’s inequality. The goal is to select a subset as centers with (cardinality constraint) and to assign clients to them such that the total cost of serving the clients from centers is minimum. In capacitated version of the problem, we are also given a bound on the maximum number of clients that facility can serve. The soft capacity version allows a facility to be opened any number of times whereas the hard capacity version restricts the facilities to be opened at most once. In -median with penalties, each client has an associated penalty and we are allowed not to serve some clients at the cost of paying penalties for them. In this paper, we address the hard-capacitated median (CkM) problem and its penalty variant(CkMP), when the capacities are uniform for all . Our results are stated in Theorems 1 and 2. For the problems, we define an -approximation algorithm as a polynomial-time algorithm that computes a solution using at most number of facilities with cost at most times the cost of an optimal solution using at most facilities.

###### Theorem 1.

There is a polynomial time local search heuristic that approximates hard uniform capacitated median problem within factor of the optimal violating the cardinality by a factor of .

In contrast to the LP-based algorithms, local search technique is known to be straightforward , simple to apply and has been shown to perform well in practice via empirical studies [22, 23]. Power of local search technique over the LP-based algorithms is well exhibited by the fact that there are constant factor approximation ( for uniform and for non-uniform) [2, 6] for capacitated facility location problem whereas the natural LP is known to have an unbounded integrality gap. On the other hand, local search heuristics are notoriously hard to analyze. This is evident from the fact that the only work known, based on local search heuristics, for CkM, is due to Korupolu et al more than years ago.

Our work provides a trade-off between the approximation factor and the cardinality violation with the existing work. Koruplou et al gave approximation factor with factor loss in cardinality using local search paradigm. Though the approximation factor can be made arbitrarily small, cardinality loss is at least . Small approximation factor is obtained at a big loss in cardinality. For example, for anything less than , cardinality violation is more than . To achieve factor approximation using their heuristic, cardinality violation is . Thus, we improve upon their result in terms of cardinality. On the other hand, we improve upon the results in [1, 25, 26] in terms of factor-loss though the cardinality loss is a little more in our case. Aardal et al gave factor approximation, with the violation of cardinality by a factor using LP Rounding. factor approximation is given in [25, 26] violating the cardinality by a factor of using sophisticated strengthened LPs.

###### Theorem 2.

There is a polynomial time local search heuristic that approximates hard uniform capacitated median problem with penalties within factor of the optimal violating the cardinality by a factor of .

To the best of our knowledge, no result is known for CkMP.

### 1.1 Related Work

Both, LP-based algorithms as well as local search heuristics, have been used to obtain good approximate algorithms for the (uncapacitated) -median problem. [1, 3, 4, 5, 9, 11, 12, 20, 21, 27]. The best known factor of was given by Byrka et al. Obtaining a constant approximation factor for CkM is an open problem. Natural LP is known to have an unbounded integrality gap when one of the constraints (cardinality/capacity) is allowed to be violated by a factor of less than without violating the other constraint, even for uniform capacities.

Several constant factor approximations are known [8, 11, 12, 24, 16] for the problem that violate the capacities by a factor of or more. A algorithm was given by Aardal et al.   violating the cardinality constraint by a factor of . Koruplou et al gave approximation factor with factor loss in cardinality. Very recently, Byrka et al.   broke the barrier of in capacities and gave an approximation violating capacities by a factor of factor for uniform capacities. For non-uniform capacities, a similar result has been obtained by Demirci et al.  in  . Li [25, 26] strengthened the LP to break the barrier of in cardinality and gave an approximation using at most facilities. Though the algorithm violates the cardinality only by , it introduces a softness bounded by a factor of . The running time of the algorithm is .

The other commonly used technique for the problem is local search [4, 11, 22] with the best factor of given by Arya et al. Local search technique has been particularly useful to deal with capacities for the facility location problem [13, 29, 33, 17, 30, 2, 6].

Some results are known for the penalty variant of (uncapacitated) facility location problems, TSP and steiner network problems [28, 19, 31, 32, 15, 7]. For the capacitated variant of facility location problem with penalties, factor approximation for uniform and factor for non-uniform capacities were given by Gupta and Gupta in . This is the only result known for the problems with extension on capacities as well as penalties.

### 1.2 High Level Idea

Let denote any feasible solution. The algorithm performs one of the following operations if it reduces the cost and it halts otherwise. The local search operations are and -. Given a set of open facilities, min-cost flow problem is solved to obtain the optimal assignments of clients to opened facilities.

To define the swaps and the reassignments for the purpose of analysis, we extend the ideas of Arya et al. Swaps are defined so that every facility in optimal solution is swapped in at least once and at most thrice whereas facilities in our locally optimal solution is swapped out at most thrice. When a facility in our locally optimal solution is swapped out, some of its clients are reassigned to other facilities in our solution via a mapping similar to the one defined in . However, for the capacitated case, mapping needs to be done a little carefully. An almost fully utilized facility may not be able to accommodate all the clients mapped to it and conversely a partially utilized facility may not be able to accommodate the load of an almost fully utilized facility. To address this concern, we partition the facilities of our locally optimal solution into heavy (denoted by ) and light (denoted by ). A facility is said to be heavy if it serves more than () clients in our solution and is called light otherwise. Heavy facilities neither participate in swaps nor the mapping. Thus, mapping is defined between the clients of light facilities only. We allow to open () facilities in our solution so that we have at least light facilities.

There are two situations in which we may not be able to define a feasible mapping between the clients of two light facilities. First situation is explained as follows: Let denote some optimal solution, let be the number of clients, a facility shares with the light facilities of our solution. All the clients of a facility cannot be mapped to clients of other facilities of our solution if shares more than clients with . Second situation arises when shares more than clients with . In this case, mapping may be possible but it may not be feasible as the other facility , to which its clients are mapped, may not have sufficient available capacity to accommodate the clients of . We say that dominates in the first case and that covers in the second case.

Although a facility may dominate several facilities in , it can cover at most one facility in . Whereas, a facility can be dominated by at most one facility in , it can be covered by at most two facilities in . The scenario in which a facility is covered by exactly facilities, say and needs to be handled carefully. In this case, we say that as well as specially covers . We denote the set of such facilities in as . Since mapping of clients of and cannot be done in , we would like to swap and with and, assign their clients to , i.e. we would like to perform . However since we do not have this operation, we look for one more facility in so that we can perform double-swap of with . First we look for such that together either dominate or cover . Clearly neither nor , being light, can cover any facility other than . Thus we look for that is dominated by them. If do not dominate any facility other than , we form a triplet and keep it aside. We call such triplets are nice triplets. They will be used to swap in some facilities of which are not swapped in otherwise. If they dominate exactly one facility , then we perform double-swap of with . If they dominate at least two facilities other than , then we cannot swap them out at all. We call such a pair of facilities as a bad pair.

Remaining facilities in

are classified as good, bad and nice. A facility that does not dominate any facility in

is termed as nice. A nice facility can be swapped in with any facility in . A facility that dominates exactly one facility in is termed as good. We perform (single) in this case. A facility that dominates more than one facilities in is termed as bad. Bad facilities cannot participate in swaps. Let denote the set of facilities of that are either dominated by bad facilities or by bad pairs in . Facilities of are swapped in using the triplets (using - or the nice facilities (using ). We show that the total number of triplets and the nice facilities is at least one thirds of so that each facility of is swapped out most times and each facility of is swapped in at least once and at most times (Note that in the process, the facilities of which were there in the triplets also get swapped thrice). Swapping in a facility of thrice contributes a factor of and swapping out a facility of thrice contributes a factor of making a total of factor approximation.

Extending swap and double-swap to multi-swap, where upto facilities can be swapped simultaneously, we are able to ensure that every is swapped out at most times, and every is swapped in at most times thereby reducing the factor to .

For CkMP, we start with an initial feasible solution with facilities from . The clients are assigned by solving min cost flow problem over the facilities , where and . Clients assigned to pay penalty in the solution . We bound the cost of the locally optimal solution, in the same manner as done for CkM.

Given a problem , local search algorithm starts with a candidate feasible solution . A set of operations are defined such that performing an operation results in a new solution , called the neighbourhood solution of . A solution may have more than one neighbourhood solutions. An operation is performed if it results in improvement in the cost. We formally describe the steps of the algorithm for a minimization problem.

1. Compute an arbitrary feasible solution to .

2. while is a neighborhood solution of such that
do .

The algorithm terminates at a locally optimal solution , i.e. for every neighborhood solution

In the above algorithm presented, we move to a new solution if it gives some improvement in the cost, however small that improvement may be. This may lead to an algorithm taking lot of time. To ensure that the algorithm terminates in polynomial time, a local search step is performed only when the cost of the current solution is reduced by at least , where is the size of the problem instance and is an appropriate polynomial in and for a fixed . This modification in the algorithm incurs a cost of additive in the approximation factor.

### 1.4 Organization of the paper

For the sake of easy disposition of ideas, we first present a weaker result for CkM in Section 2. The algorithm uses two operations: a (single) swap and a double swap and provides an () solution. The factor is subsequently improved to in Section 3 using multi-swap operation. The results are then extended to CkMP in Section 4.

## 2 (9+ϵ,8/3) algorithm for Capacitated k-Median Problem

In this section, we present a local search algorithm that computes a solution with cost at most times the cost of an optimal. We start with an initial feasible solution selected as an arbitrary set of facilities. Given a set of open facilities, optimal assignments of the clients is obtained by solving min-cost flow problem.

For any feasible solution , algorithm performs one of the following operations, if it reduces the cost and terminates when it is no longer possible to improve the cost using these operations.

1. : , , .

2. -: , , .

###### Claim 1.

For the locally optimal solution , and optimal solution we have,

###### Proof.

The claims follow trivially when by the local optimality of . Next, suppose . Let . Then if and it is otherwise. Clearly the cost of assignment to facilities in can not be smaller than the cost of assignment to facilities in . If , then (by the argument of single swap) and . All the other cases can be argued similarly. ∎

### 2.1 Notations

Let denote the locally optimal solution and denote an optimal solution to the problem. Let be the set of clients served by and be the set of clients served by . Let denote the set of clients served by and i.e.. For a client , let and denote the facilities serving in and respectively. Let and denote the service costs paid by in and respectively.

Facilities in are partitioned into heavy () and light (). A facility is said to be heavy if and light otherwise. When a facility in our locally optimal solution is swapped out, some of its clients are reassigned to other facilities in our solution via a mapping similar to the one defined in . We may not be able to define a feasible mapping for the heavy facilities. Thus heavy facilities are never swapped out and no client is mapped onto them for reassignment. Consider a facility , let . Let .

We introduce two concepts important to define the swaps and the mapping.

• A facility is said to dominate , if . Note that a facility can be dominated by at most one facility in where as a facility can dominate any number of facilities. Extending the definition to set , we say that a set dominates if . Let denote the set of facilities dominated by . When , slightly abusing the notation we use instead of .

• A facility is said to cover , if . Note that if then it can cover at most one facility in . Also a facility can be covered by at most facilities in . Extending the definition to set covers if . Let denote the set of facilities covered by . Also we will use instead of when .

### 2.2 Analysis: The Swaps

Consider a set of facilities in such that each of them is covered by exactly two light facilities. Let denote the set of such facilities. For , a and onto mapping can be defined such that the following claim holds,

For and

1. .

2. If then .

###### Proof.

can be defined as follows: Order the clients in as such that for every with a nonempty , the clients in are consecutive; that is, there exists , such that . Define , where .

We show that satisfies the claim. We prove (1) using contradiction. Suppose if possible that both , for some , where . If , then . If , then . In either case, we have a contradiction, and hence mapping satisfies the claim.

For (2), as , then at most one facility can cover . If then for all . And if then . In either case the claim holds true.

Mapping is used to reassign the clients of a facility that is swapped out to other facilities . Claim (2.1) ensures that if does not dominate , then the client is mapped to some , whereas claim (2.2) ensures that if , then no more than clients are mapped to . But if such that , then more than clients may get mapped to . This scenario poses a major challenge; thus facilities in are considered separately while defining the swaps.

For , such that and . Consider a facility , then let denote the set such that . Let . Let and . Let . Figure 1(a) shows the relationship between and . The following claims hold.

we have .

###### Proof.

Suppose if possible . Let . This implies and which is a contradiction as . ∎

we have .

###### Proof.

Suppose if possible let . This implies and . This is a contradiction as from claim 3 and cannot be dominated dominated by two disjoint set of facilities. ∎

.

###### Proof.

Suppose if possible let . As , we have . By the definition we have thus for some . This implies which is a contradiction as using claim 3. ∎

We consider at-most swaps, satisfying the following properties.

1. Each is considered in atleast one swap and at most three swaps.

2. If , is not considered in any swap operation.

3. Each is considered in at most three swaps.

4. If is considered then ; and ; .

5. If - is considered then ; and ; .

Let and denote the set of facilities that have participated in the swaps at any point of time. Initially . While considering the facilities, we also maintain the sets and ; initially . The facilities in will never participate in any swap. Facilities in correspond to the facilities in in some way which will become clear when we define the swaps. We also maintain a set of triplets denoted by (will be defined shortly) and two sets and corresponding to . All the three sets are empty initially. Throughout we maintain that are pairwise disjoint and are pairwise disjoint.

For with .

1. If then . In this case we call a nice pair. Set , , .

2. If , let In this case we call a good pair and consider - which is nothing but -. Set , .

3. If then we call a bad pair. Set , . That is, put the bad pairs in and the facilities dominated by them in . Note that the cardinality of increased by while cardinality of increased by at least .

Figure 2(a) shows the partitions and at this time. Let and . Note that and . Also, clearly , as for every facility added to , two facilities are added to . and . The last claim follows as for every two facilities added to , at least three facilities are added to . Next, we consider the facilities in and . We say that a facility is good if , bad if , else nice (i.e. ). Let , and denote the set of good, bad and nice facilities respectively and, is partitioned into and . Let denote the set of facilities in captured by good facilities. Let denote the set of facilities in captured by bad facilities, and let denote the set of facilities in not captured by any facility in . Figure 1(b) shows the relationship between and . Figure 1: (a) Relationship between the partitions of Ssp and Dsp∪Osp. (b)Relationship between the partitions of Sg and of Og, Sb. Figure 2: Partitions of SL and O.
1. For every (): So nothing.

2. For every (): Perform . Update and as , in this order.

3. For every (): Set , in this order. That is, put the bad facility in and the facilities dominated by it in . The cardinality of increased by while the cardinality of increased by at least .

New partitions are shown in Figure 2(b). Let be the set of facilities in that have not participated in any swap. Then such a facility is either a nice facility, a bad facility, is in a nice pair or in a bad pair. Similarly, let be the set of facilities in that have not participated in the above swaps. Then, is the set of facilities in that are in a triplet. Let . Then facilities in are either dominated by a bad facility, by a bad pair, or are not dominated by any facility or a pair., . Let be the number of such facilities , . Next claim shows that there are at least nice facilities and nice pairs taken together.

###### Proof.

While handling , for every facility added in atleast facilities are added in and while handling , atleast facilities are added in for every facilities added in . Thus we have

Also . and

Thus

Similarly and

Thus

Also , , and

Thus we get

Next, consider the following swaps in which the facilities in are swapped with nice facilities or nice pairs in in a way that each nice facility or a facility in a nice pair is considered in at most swaps and each facility in is also considered in at most swaps.

1. Repeat until . Pick .

1. If . Pick a facility ; perform
, , . Set .

2. Else, pick a triplet , and perform
-, -, -.

3. Set

2. If , either there must be a facility or a triplet ; accordingly perform swap or double-swap with the facilities in in the same manner as described in step 1.

The swaps are summarized in Figure 4 and Figure 3. Figure 3: (a) Partitions of SL and O in terms of SW,SB,^S,Sn and OW,OW,^O,On respectively. (b) Swaps

### 2.3 Analysis: Bounding the Cost

Now we bound the cost of these swaps. Whenever we consider a swap of form -, the mapping as defined in claim