DeepAI
Log In Sign Up

Approximation Algorithms for Continuous Clustering and Facility Location Problems

We consider the approximability of center-based clustering problems where the points to be clustered lie in a metric space, and no candidate centers are specified. We call such problems "continuous", to distinguish from "discrete" clustering where candidate centers are specified. For many objectives, one can reduce the continuous case to the discrete case, and use an α-approximation algorithm for the discrete case to get a βα-approximation for the continuous case, where β depends on the objective: e.g. for k-median, β = 2, and for k-means, β = 4. Our motivating question is whether this gap of β is inherent, or are there better algorithms for continuous clustering than simply reducing to the discrete case? In a recent SODA 2021 paper, Cohen-Addad, Karthik, and Lee prove a factor-2 and a factor-4 hardness, respectively, for continuous k-median and k-means, even when the number of centers k is a constant. The discrete case for a constant k is exactly solvable in polytime, so the β loss seems unavoidable in some regimes. In this paper, we approach continuous clustering via the round-or-cut framework. For four continuous clustering problems, we outperform the reduction to the discrete case. Notably, for the problem λ-UFL, where β = 2 and the discrete case has a hardness of 1.27, we obtain an approximation ratio of 2.32 < 2 × 1.27 for the continuous case. Also, for continuous k-means, where the best known approximation ratio for the discrete case is 9, we obtain an approximation ratio of 32 < 4 × 9. The key challenge is that most algorithms for discrete clustering, including the state of the art, depend on linear programs that become infinite-sized in the continuous case. To overcome this, we design new linear programs for the continuous case which are amenable to the round-or-cut framework.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

09/30/2020

On Approximability of Clustering Problems Without Candidate Centers

The k-means objective is arguably the most widely-used cost function for...
06/22/2022

Constant-Factor Approximation Algorithms for Socially Fair k-Clustering

We study approximation algorithms for the socially fair (ℓ_p, k)-cluster...
04/01/2020

k-Median clustering under discrete Fréchet and Hausdorff distances

We give the first near-linear time (1+)-approximation algorithm for k-me...
11/03/2017

Constant Approximation for k-Median and k-Means with Outliers via Iterative Rounding

In this paper, we present a novel iterative rounding framework for many ...
11/21/2021

Johnson Coverage Hypothesis: Inapproximability of k-means and k-median in L_p metrics

K-median and k-means are the two most popular objectives for clustering ...
02/23/2021

Robust k-Center with Two Types of Radii

In the non-uniform k-center problem, the objective is to cover points in...
11/13/2020

Consistent k-Clustering for General Metrics

Given a stream of points in a metric space, is it possible to maintain a...

1 Introduction

Clustering is a ubiquitous problem arising in various areas ranging from data analysis to operations research. One popular class of clustering problems are the so-called center-based clustering problems where the quality of the clustering is determined by a function of the distances of every point in to the “centers” of the clusters they reside in. Two extensively studied measures are the sum of these distances, with the resulting problem called the -median problem, and the sum of squares of these distances, with the resulting problem called the -means problem.

In most settings, these center-based clustering problems are NP-hard and one considers approximation algorithms for the same. Traditionally, however, approximation algorithms for these problems have been studied in finite/discrete metric spaces and, in fact, usually under the constraint that the set of centers, aka facilities, can be selected only from a prescribed subset of the metric space. Indeed, this model makes perfect sense when considering applications in operations research where the possible depot-locations may be constrained. These discrete problems have been extensively studied [HochbS1985, shmoys-tardos-aardal-1997, cgst-primal-rounding, guha-khuller-hardness, ChariKMN2001, CharG1999, JainV2001, JainMMSV2003, AryaGKMMP2001, KanunMNPSW2004, Li2013, LiS2016, ByrkaGRS2013] over the last three decades. For instance, for the -median problem, the best known approximation algorithm is a -approximation [k-median-approx], while the best known hardness is  [JainMMSV2003]. For the -means problem, the best known approximation algorithm is a -approximation algorithm [Ahmadian2017BetterGF, GuptaT2008, KanunMNPSW2004], while the best hardness is  [JainMMSV2003].

Restricting to a finite metric space, however, makes the problem easier, and indeed many of the above algorithms in the papers mentioned above would be infeasible to implement if were extremely large – for instance, if were for some large dimension , and the distance function were the -metric, for some . On the other hand, it is reasonably easy to show using triangle inequality that if one considers opening centers from itself, and thus reduces the problem to its discrete version, then one incurs a hit of a factor in the approximation factor, where is a constant depending on the objective function. In particular, if we look at the sum-of-distances objectives such as in -median, then , while if one looks at the sum-of-squared-distances such as in -means, then . Therefore, one immediately gets a -approximation for the continuous -median problem and a -approximation for the continuous -means problem. The question we investigate is

Is this factor hit necessary between the continuous and discrete versions of center based clustering problems, or can one design better approximation algorithms for the continuous case?

It is crucial to note that when considering designing algorithms, we do not wish to make any assumptions on the underlying metric space . For instance, we do not wish to assume for some . This is important, for we really want to compare ourselves with the which is obtained using only the triangle-inequality and symmetry property of . On the other hand, to exhibit that a certain algorithm does not work, any candidate metric space suffices.

Recently, in a thought-provoking paper [cohen-addad-et-al], Cohen-Addad, Karthik, and Lee show that, unless , the -median and -means problem defined on cannot have an approximation ratio better than and , respectively, even when is a constant! Since the discrete problems have trivial exact algorithms via enumeration when is a constant, this seems to indicate that in certain cases the above factor hit is unavoidable. Is it possible that the inapproximability of the continuous problem is indeed times the inapproximability of the discrete version?

1.1 Our Results

Our main contribution is a direct approach towards the continuous versions of clustering problems. We apply this to the following clustering problems where we obtain a factor better than , where is the best known factor for the discrete version of the problem.

  • In the continuous -UFL problem, a “soft” version of the continuous -median problem, one is allowed to pick any number of centers but has to pay a parameter for each picked center. The objective is to minimize the sum of distances of points in to picked centers plus the cost for opening these centers. Again, note that the centers can be opened anywhere in . For the discrete version, where the only possible center locations are in , there is a -approximation due to Li [Li2013], and a hardness of approximation within a factor of is known due to Guha and Khuller [guha-khuller-hardness]. We describe a -approximation algorithm. Note that and thus, for this problem, the inapproximability is not times that of the discrete case. We also show how the reduction of [cohen-addad-et-al] carries over, to prove a hardness of for this problem.

  • In the continuous -means problem, we wish to minimize the sum of squares of distances of clients to the closest open center. Recall that for this problem we have , and thus one gets a -factor algorithm for the continuous -means using the best known -factor [Ahmadian2017BetterGF, GuptaT2008, KanunMNPSW2004] algorithm for the discrete problem. We describe an improved -approximation for the continuous -means problem.

  • For the continuous -median problem, our techniques fall short of improving the best known approximation factor for the discrete -median problem. On the other hand, we obtain better algorithms for the the individually fair or priority version of the continuous -median problem. In this problem, every point has a specified radius and desires a center opened within this distance. The objective is the same as the -median problem: minimize the sum of all the distances. This problem arises as a possible model [JungKL2020, chakrabarty-negahbani-fairness, MahabV2020, VakilY2021, Ples87, BCCN21] in the study of fair clustering solutions, since the usual -median algorithms may place certain clients inordinately far away. At a technical level, this problem is a meld of the -median and the -center problems; the latter is -hard, which forces one to look at bicriterion approximations. An -approximation would return a solution within times the optimum but may connect to a point as far as away. Again, any -approximation for the discrete version where would imply a -approximation for the continuous version.

    The best discrete approximation is due to Vakilian and Yalçıner [VakilY2021] which would imply an -approximation for the continuous version. We describe an -approximation for the continuous version of the problem.

  • In the -center with outliers (kCwO) problem, we are given a parameter , and we need to serve only of the clients. The objective is the maximum distance of a served client to its center. The -center objective is one of the objectives for which most existing discrete algorithms can compare themselves directly with the continuous optimum. The -approximation algorithm in [ChariKMN2001] for the kCwO problem is one such example. However, the best known algorithm for kCwO for the discrete case (when ) is a -approximation by Chakrabarty, Goyal, and Krishnaswamy [ChakrGK2020] which proceeds via LP rounding, and does not give a -approximation for continuous kCwO. This was explicitly noted in a work by Ding, Yu, and Wang (“… unclear of the resulting approximation ratio for the problem in Euclidean space.”[DingYW19], that describes a -approximation for kCwO in Euclidean space, however, violating the number of clients served. We give a proper -approximation for the continuous kCwO problem (with no assumptions on the metric space) with no violations.

1.2 Our Technical Insight

Most state of the art approximation algorithms for center-based clustering problems are based on LP relaxations where one typically has variables for every potential location of a center. When the set is large, this approach becomes infeasible. Our main technical insight, underlying all our results, is to use a different style of linear program with polynomially many variables but exponentially many constraints. We then use the round-or-cut framework to obtain our approximation factor. More precisely, given a potential solution to our program, we either “round” it to get a desired solution within the asserted approximation factor, or we find a separating hyperplane

proving that this potential solution is infeasible. Once this hyperplane is fed to the ellipsoid algorithm 

[ellipsoid], the latter generates another potential solution, and the process continues. Due to the ellipsoid method’s guarantees, we obtain our approximation factor in polynomial time.

For every client , our LP relaxation has variables of the form , indicating whether there is some point in an -radius around which is “open” as a center. Throughout the paper we use as a quantity varying “continuously”, but it can easily be discretized, with a loss of at most , to arise from a set of size . Thus there are only polynomially many such variables. We add the natural “monotonicity” constraints: whenever . Interestingly, for one of the applications, we also need the monotonicity constraints for non-concentric balls: if , then we need .

We have a variable indicating the cost the client pays towards the optimal solution. Next, we connect the ’s and the ’s in the following ways (when , and something similar when ). One connection states that for any , and we add these to our LP. For the last two applications listed above, this suffices. However, one can also state the stronger condition of . Indeed, the weaker constraint is the “Markov-style inequality” version of the stronger constraint.

Our second set of constraints restrict the ’s to be “not too large”. For instance, for the fair -median or kCwO problems where we are only allowed points from , we assert that for any set of disjoint balls , we must have the sum of the respective ’s to be at most . This set of constraints is exponentially many, and this is the set of constraints that need the round-or-cut machinery. For the -UFL problem, we have that the sum of the ’s scaled by plus the sum of the ’s should be at most , which is a running guess of .

Once we set up the framework above, then we can port many existing rounding algorithms for the discrete clustering problems without much hassle. In particular, this is true for rounding algorithms which use the ’s as the core driving force. For the continuous -UFL problem, we port the rounding algorithm from the paper [shmoys-tardos-aardal-1997] by Shmoys, Tardos, and Aardal. For the continuous -means problem, we port the rounding algorithm from the paper [cgst-primal-rounding] by Charikar, Guha, Shmoys, and Tardos. For the continuous fair -median problem, we port the rounding algorithm from the paper [chakrabarty-negahbani-fairness] by Chakrabarty and Negahbani, which itself builds on the algorithm present in the paper [AlamdS2017] by Alamdari and Shmoys. For the continuous kCwO problem, we port the rounding algorithm present in the paper [ChakrGK2020] by Chakrabarty, Goyal, and Krishnaswamy

Our results fall short for the continuous -median problem (without fairness), where we can port the rounding algorithm from the paper [cgst-primal-rounding] and get a -approximation. This, however, does not improve upon the -factor mentioned earlier.

1.3 Other Related Works and Discussion

The continuous -means and median problems have been investigated quite a bit in the specific setting when and when is the distance. The paper [Matousek2000] by Matoušek describes an -approximation (PTAS) that runs in time . This led to a flurry of results [HarPM2004, delaVFKKR03, KumarSS2004, Chen2006, FeldmanMS2007] on obtaining PTASes with better dependencies on and via the applications of coresets. There is a huge and growing literature on coresets, and we refer the interested reader to the paper [CohenSS2021] by Cohen-Addad, Saulpic, and Schwiegelshohn, and the references within, for more information. Another approach to the continuous -means problem has been local search. The paper [KanunMNPSW2004] which describes a -approximation was first stated for the geometric setting, however it also went via the discretization due to Matoušek [Matousek2000] and suffered a running time of exponential dependency on the dimension. More recent papers [FriggstadRS2019l, CohenKM2019l] described local-search based PTASes for metrics with doubling dimension , with running time exponentially depending on . These doubling metrics generalize -metrics. However, none of the above ideas seem to suggest better constant factor approximations for the continuous -median/means problem in the general case, and indeed even when but is part of the input.

The -means problem in the metric space , where and are not constants, has been studied extensively [Trevisan2000, AwasthiCKS2015, Cohen-AddadK2019, Cohen-AddadKL2022, Ahmadian2017BetterGF, Cohen-AddadEMN2022], and is called the Euclidean -means problem. The discrete version of this problem was proved APX-hard in 2000 [Trevisan2000], but the APX-hardness of the continuous version was proved much later, in 2015 [AwasthiCKS2015]. More recently, the hardness results for both versions have been improved: the discrete Euclidean -means problem is hard to approximate to factor , while the continuous problem is hard to approximate to factor  [Cohen-AddadK2019]. Moreover, under assumption of a complexity theoretic hypothesis called the Johnson coverage hypothesis, these numbers have been improved to and , respectively [Cohen-AddadKL2022]. On the algorithmic side, the discrete Euclidean -means problem admits a better approximation ratio than the general case: a approximation was described in 2017 [Ahmadian2017BetterGF], which was very recently improved to  [Cohen-AddadEMN2022].

We believe that our paper takes the first stab at getting approximation ratios better than the best discrete factor for the continuous clustering problems. Round-or-cut is a versatile framework for approximation algorithm design with many recent applications [CarrFKP2001, ChakrCKK2015, AnSS2017, chakrabarty-negahbani-f-center, AneggAZ21], and the results in our paper is yet another application of this paradigm. However, many questions remain. We believe that the most interesting question to tackle is the continuous -median problem. The best known discrete -median algorithms are, in fact, combinatorial in nature, and are obtained via applying the primal-dual/dual-fitting based methods [JainV2001, JainMMSV2003, LiS2016, k-median-approx] on the discrete LP. However, their application still needs an explicit description of the facility set, and it is interesting to see if they can be directly ported to the continuous setting.

All the algorithms in our paper, actually still open centers from . Even then, we are able to do better than simply reducing to the discrete case, because we do not commit to the loss upfront, and instead round from a fractional solution that can open centers anywhere in . This raises an interesting question for the -median problem (or any other center based clustering problem): consider the potentially infinite-sized LP which has variables for all , but restrict to the optimal solution which only is allowed to open centers from . How big is this “integrality gap”? It is not too hard to show that for the -median problem this is between and . The upper bound gives hope we can get a true -approximation for the continuous -median problem, but it seems one would need new ideas to obtain such a result.

Organization of this Paper

In the main body, we focus on the continuous -UFL and the continuous fair -median results, since we believe that they showcase the technical ideas in this paper. Proofs of certain statements have been deferred to the appendix. The description of the results on continuous -means and continuous -center with outliers can be found in LABEL:appsec:kmeans and LABEL:appsec:kcwo, respectively.

2 Preliminaries

Given a metric space on points with pairwise distances , we use the notation for and to denote ’s distance to the set .

Definition 1 (Continuous -median (Cont--Med)).

The input is a metric space , clients , and . The goal is to find minimizing .

Definition 2 (Continuous Fair -median (ContFair--Med)).

Given the Cont--Med input, plus fairness radii , the goal is to find such that , , minimizing .

In the Uncapacitated Facility Location (UFL) problem, the restriction of opening only facilities is replaced by having a cost associated with opening each facility. When these costs are equal to the same value for all facilities, the problem is called -UFL.

Definition 3 (Continuous -Ufl (Cont--Ufl)).

Given a metric space , clients , and , find that minimizes the sum of “connection cost” and “facility opening cost” .

Let denote the diameter of a metric . For , , the ball of radius around is . Throughout the paper, we use balls of the form where is a client and . To circumvent the potentially infinite number of radii, the radii can be discretized into for a small constant . Thereupon, we can appeal to the following lemma to bound the size of by .

Lemma 4 (Rewording of Lemma 4.1, [Ahmadian2017BetterGF]).

Losing a factor of , we can assume that for any , .

For simplicity of exposition, we present our techniques using radii in , and observe that discretizing to incurs an additive loss of at most in our guarantees. We also note that by the above, which enables us to efficiently binary-search over our guesses .

3 Continuous -Ufl

We start this section with our -approximation for Cont--UFL (Theorem 5). For this, we introduce a new linear programming formulation, and adapt the rounding algorithm of Shmoys-Tardos-Aardal to the new program. The resulting procedure exhibits our main ideas, and serves as a warm-up for the remaining sections. Also, in Section 3.2, we prove that it is NP-hard to approximate Cont--UFL within a factor of , using ideas due to Cohen-Addad, Karthik, and Lee [cohen-addad-et-al]. This shows that the continuous version cannot be approximated as well as the discrete version, which has a best-known approximation factor of 1.463 [Li2013].

3.1 Approximation algorithm

This subsection is dedicated to proving the following theorem:

Theorem 5.

There is a polynomial time algorithm that, for an instance of Cont--UFL with optimum , yields a solution with cost at most . Here .

We design the following linear program for Cont--UFL. We use variables for the connection cost of each client , and for the number of facilities opened within each ball of the form . We also use a guess of the optimum , which we will soon discuss how to obtain. Throughout, we use as shorthand for where .

(UFL)
(UFL-1)
(UFL-2)

Observe that, given a solution of cost at most , we can obtain a feasible solution of UFL as follows. For client , we set . For , we set for and for .

Our approach is to round a solution of UFL. Observe that there are polynomially many constraints of the form (UFL-1) and (UFL-2); hence, we can efficiently obtain a solution that satisfies them. So for the remainder of this section, we assume that those constraints are satisfied. On the other hand, there are infinitely many constraints of type (UFL). This is why we employ a round-or-cut framework via the ellipsoid algorithm [ellipsoid]. We begin with an arbitrary , and when ellipsoid asks us if a proposed solution is feasible, we run the following algorithm.

The algorithm inputs , and defines as the minimum radius at which client has at least mass of open facilities around it. First, all clients are deemed uncovered . Iteratively, the algorithm picks the , i.e the uncovered client, with the smallest . is put into the set . Any client within distance of is considered a of and is now covered. When all clients are covered, i.e. , the algorithm outputs .

1:A proposed solution for UFL, parameter
2: for all
3: “representative” clients
4: “uncovered” clients
5:while  do
6:      Pick with minimum
7:     
8:     
9:     
10:end while
11:
Algorithm 1 Filtering for Cont--UFL

Notice that, by construction, the collection of balls is pairwise disjoint. Hence, the following constraint, which we call , is of the form (UFL):

()

We will show that

Lemma 6.

If satisfies (UFL-1), (UFL-2), and , then there exists a suitable for which the output of Algorithm 1 has cost at most .

Thus, if we find that the desired approximation ratio is not attained, then it must be that was not satisfied, and we can pass it to ellipsoid as a separating hyperplane. If ellipsoid finds that the feasible region of our linear program is empty, then we increase and try again. Otherwise, we obtain a solution that attains the desired guarantees.

We now analyze Algorithm 1 to prove Lemma 6.

Proof of Lemma 6.

For this proof, we will fix , and refer to as .

To prove a suitable exists, assume is picked uniformly at random from for some ; we will see later that is optimal. Take , the output of Algorithm 1 on . By definition of , . Thus , which implies

(1)

To bound the expected connection cost, take and observe that, since all the clients are ultimately covered in Algorithm 1, there has to exist for which . By construction of , , which is at most by our choice of in Line 6. Thus, for any client , we get . So we are left to bound for an arbitrary client .

We have that . We notice that at . Also, . So given (UFL-2) for all balls with , we can apply a change of variable to the integral to get , where the last inequality is by (UFL-1). Thus we have . Summing over all we have

(2)

To balance from (1) and from (2), we set . The expected Cont--UFL cost of is, using ,

Since the bound holds in expectation over a random , there must exist an that satisfies it deterministically. ∎

To obtain a suitable , we can adapt the derandomization procedure from the discrete version [shmoys-tardos-aardal-1997]. The procedure relies on having polynomially many interesting radii; for this, we recall that while we have used for simplicity, our radii are actually , .

3.2 Hardness of approximation

Our hardness result for this problem is as follows:

Theorem 7.

Given an instance of Cont--UFL and , it is -hard to distinguish between the following:

  • There exists such that

  • For any ,

Thus we exhibit hardness of approximation up to a factor of , which tends to as . Our reduction closely follows the hardness proof for Cont--Med [cohen-addad-et-al]. We relegate the details to LABEL:appsec:ufl.

4 Continuous Fair -Median

The main result of this section is the following theorem.

Theorem 8.

There exists a polynomial time algorithm for ContFair--Med that, for an instance with optimum cost , yields a solution with cost at most , in which, each client is provided an open facility within distance of itself. Here .

We create a round-or-cut framework, via the ellipsoid algorithm [ellipsoid], that adapts the Chakrabarty-Negahbani algorithm [chakrabarty-negahbani-fairness] to the continuous setting. For this, we will modify the UFL linear program to suit ContFair--Med. As before, is a guessed optimum, is the cost share of a client , and represents the number of facilities opened in . There are two key modifications. First, we expand the monotonicity constraints of the form (UFL-2) to include non-concentric balls, which are crucial for adapting the fairness guarantee of Chakrabarty and Negahbani [chakrabarty-negahbani-fairness]. Second, we enforce the fairness constraints by requiring for each client .

(LP)
(LP-1)
(LP-2)
(LP-3)
(LP-4)

We will frequently use the following property of LP. See LABEL:appsec:lp for the proof.

Lemma 9.

Consider a solution of LP. If for a client , satisfies all constraints of the form (LP-2) and (LP-3) involving , then for any , .

As before, we will only worry about the constraints that are exponentially many. These are (LP-1). For this, we use ellipsoid [ellipsoid]. Given a proposed solution of LP, we construct , as follows.

We first perform a filtering step. For each , we define . In the beginning, all clients are “uncovered” (i.e. ). In each iteration, let be the uncovered client with the minimum ; and add to our set of “representatives” . Any within distance of (including itself) will be added to the set , and will be removed from . After all clients are covered, i.e. , the algorithm outputs . For a formal description of this algorithm, see LABEL:appsec:reps.

For a , let be the closest client to in . Let . So the collection of balls is pairwise disjoint, and the following constraint, which we call , is of the form (LP-1).

()

We have that

Lemma 10.

If satisfies (LP-2)-(LP-4) and , then

  1. [ref = 10.0]

Proof.

Fix . By Line LABEL:alg:fair-reps:ln:child of LABEL:alg:fair-reps,

(3)

So if , then by (LP-3) and (LP-4), . Else . By Lemma 9, .

If , then this implies . Otherwise, substituting by from (3) and setting gives , i.e. . Now, by , we have . ∎

So if we find that , then must be violated, and we can pass it to ellipsoid as a separating hyperplane. Hence in polynomial time, we either find that our feasible region is empty, or we get and such that satisfies (LP), (LP-2)-(LP-4), and . In the first case, we increase and try again. In the latter case, we round further to attain our desired approximation ratios, via a rounding algorithm that we will now describe. This algorithm focuses on and ignores other clients, as justified by the following lemma.

Lemma 11.

be a solution to ContFair--Med. Consider a proposed solution of LP that satisfies (LP). Then .

The proof closely follows from a standard technique for the discrete version [cgst-primal-rounding, chakrabarty-negahbani-fairness]. We provide the proof in LABEL:appsec:fairness:reps.

Our algorithm will also ignore facilities outside , so our solution will be a subset of . For the remainder of this section, we fix , and refer to as . We write the following polynomial-sized linear program, DLP, where are the only clients and the only facilities. The objective function of DLP is a lower bound on , so hereafter we compare our output with DLP. We do not include fairness constraints in this program, and we will see later that it is not necessary to do so.

In DLP, the variables for each denote whether is open as a facility. The variables for denote whether the client uses the facility .

(DLP)
(DLP-1)
(DLP-2)
(DLP-3)

We will now round to an integral solution of DLP. Our first step is to convert to a fractional solution of DLP. To do this, for each , we consolidate the -mass in onto , i.e. we set . By Lemma 1, each is then at least . This allows to use only itself and as its fractional facilities.

1:A proposed solution for LP, and from LABEL:alg:fair-reps
2:for  do
3:     
4:     
5:     
6:     
7:     
8:end for
9:
Algorithm 2 Consolidation for ContFair--Med
Lemma 12.

, , and is a feasible solution of DLP with cost at most .

Proof.

For a , if then . Otherwise, by Lemma 1, .

Hence , which implies feasibility by construction and . It also implies that . If , then the RHS above is . Otherwise where the last inequality follows from Lemma 9. Multiplying by and summing over all , we have by Line LABEL:alg:fair-reps:ln:rep in LABEL:alg:fair-reps,

which is at most by (LP). ∎

Now, to round to an integral solution, we appeal to an existing technique [cgst-primal-rounding, chakrabarty-negahbani-fairness]. We state the relevant result here, and provide the proof in LABEL:appsec:fairness:rounding.

Lemma 13 ([cgst-primal-rounding, chakrabarty-negahbani-fairness]).

Let be a feasible solution of DLP with cost at most , such that , . Then there exists a polynomial time algorithm that produces such that ; If , then ; , at least one of is in ; and .