Efficient approximation schemes for uniform-cost clustering problems in planar graphs

by   Vincent Cohen-Addad, et al.

We consider the k-Median problem on planar graphs: given an edge-weighted planar graph G, a set of clients C ⊆ V(G), a set of facilities F ⊆ V(G), and an integer parameter k, the task is to find a set of at most k facilities whose opening minimizes the total connection cost of clients, where each client contributes to the cost with the distance to the closest open facility. We give two new approximation schemes for this problem: -- FPT Approximation Scheme: for any ϵ>0, in time 2^O(kϵ^-3 (kϵ^-1))· n^O(1) we can compute a solution that (1) has connection cost at most (1+ϵ) times the optimum, with high probability. -- Efficient Bicriteria Approximation Scheme: for any ϵ>0, in time 2^O(ϵ^-5 (ϵ^-1))· n^O(1) we can compute a set of at most (1+ϵ)k facilities (2) whose opening yields connection cost at most (1+ϵ) times the optimum connection cost for opening at most k facilities, with high probability. As a direct corollary of the second result we obtain an EPTAS for the Uniform Facility Location on planar graphs, with same running time. Our main technical tool is a new construction of a "coreset for facilities" for k-Median in planar graphs: we show that in polynomial time one can compute a subset of facilities F_0⊆ F of size k· ( n/ϵ)^O(ϵ^-3) with a guarantee that there is a (1+ϵ)-approximate solution contained in F_0.



There are no comments yet.


page 2

page 3

page 4

page 5

page 7

page 8

page 9

page 14


A Polynomial-Time Approximation Scheme for Facility Location on Planar Graphs

We consider the classic Facility Location problem on planar graphs (non-...

A PTAS for Bounded-Capacity Vehicle Routing in Planar Graphs

The Capacitated Vehicle Routing problem is to find a minimum-cost set of...

Quasi-polynomial time approximation schemes for packing and covering problems in planar graphs

We consider two optimization problems in planar graphs. In Maximum Weigh...

Planar Median Graphs and Cubesquare-Graphs

Median graphs are connected graphs in which for all three vertices there...

To Close Is Easier Than To Open: Dual Parameterization To k-Median

The k-Median problem is one of the well-known optimization problems that...

Coresets for Clustering in Excluded-minor Graphs and Beyond

Coresets are modern data-reduction tools that are widely used in data an...

Concave connection cost Facility Location and the Star Inventory Routing problem

We study a variant of the uncapacitated facility location problem (UFL),...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We study approximation schemes for classic clustering objectives, formalized as follows. Given an edge-weighted graph together with a set of vertices called clients, a set of vertices called candidate facilities, and an opening cost , the Uniform Facility Location problem asks for a subset of facilities (also called centers) that minimizes the cost defined as . In the Non-uniform Facility Location variant, the opening costs may vary between facilities.

We also consider the related -Median problem, where the tuple comes with a hard budget for the number of open facilities (as opposed to the opening cost ). That is, the problem asks for a set of size at most that minimizes the connection cost . Note that Uniform Facility Location can be reduced to -Median by guessing the number of open facilities in an optimal solution.

Facility Location and -Median model in an abstract way various clustering objectives appearing in applications. Therefore, designing approximation algorithms for them and their variants is a vibrant topic in the field of approximation algorithms. For Non-uniform Facility Location, a long line of work [1, 14, 21, 15] culminated with the -approximation algorithm by Li [17]. On the other hand, Guha and Khuller [12] showed that the problem cannot be approximated in polynomial time within factor better than unless , which gives almost tight bounds on the best approximation factor achievable in polynomial time. For -Median, the best known approximation ratio achievable in polynomial time is due to Byrka et al. [3], while the lower bound of due to Guha and Khuller [12] holds here as well.

Given the approximation hardness status presented above, it is natural to consider restricted metrics. In this work we consider planar metrics: we assume that the underlying edge-weighted graph is planar.

It was a long-standing open problem whether Facility Location admits a polynomial-time approximation scheme (PTAS) in planar metrics. For the uniform case, this question has been resolved in affirmative by Cohen-Addad et al. [7] in an elegant way: they showed that local search of radius actually yields a -approximation, giving a PTAS with running time . This approach also gives a PTAS for -Median with a similar running time, and works even in metrics induced by graphs from any fixed proper minor-closed class.

Very recently, Cohen-Addad et al. [8] also gave a PTAS for Non-uniform Facility Location in planar metrics using a different approach. Roughly, the idea is to first apply Baker layering scheme to reduce the problem to the case when in all clusters (sets of clients connected to the same facility) in the solution, all clients are within distance between and  from the center, for some constant depending only on . This case is then resolved by another application of Baker layering scheme, followed by a dynamic programming on a hierarchichal decomposition of the graph using shortest paths as balanced separators.

Both the schemes of [7] and of [8] are PTASes: they run in time for some function . It is therefore natural to ask for an efficient PTAS (EPTAS): an approximation scheme with running time for some function . Recently, such an EPTAS was given by Cohen-Addad [5] for -Means in low-dimensional Euclidean spaces; this is a variant of -Median where every client contributes to the connection cost with the square of its distance from the closest open facility. Here, the idea is to apply local search as in [7], but to use the properties of the metric to explore the local neighborhood faster. Unfortunately, this technique mainly relies on the Euclidean structure (or on the bounded doubling dimension of the input) and seems hard to lift to the general planar case. Also the techniques of [8] are far from yielding an EPTAS: essentially, one needs to use a logarithmic number of portals at every step of the final dynamic programming in order to tame the accumulation of error through levels of the decomposition.

The goal of this work is to circumvent these difficulties and give an EPTAS for Uniform Facility Location in planar metrics.

Our results.

Our main technical contribution is the following theorem. In essence, it states that when solving -Median on a planar graph one can restrict the facility set to a subset of size , at the cost of losing a multiplicative factor of on the optimum connection cost. This can be seen as the planar version of the classic result by Matoušek [19] who showed that for Euclidean metrics of dimension , it is possible to reduce the number of candidate centers to at the cost of losing a multiplicative factor of on the optimum connection cost (through the use of coresets as well). For general metrics, obtaining such a result seems challenging, since this would imply a -approximation algorithm with running time , which would contradict Gap-ETH [6].

From now on, by with constant probability we mean with probability at least ; this can be boosted by independent repetition.

Theorem 1.

Given a -Median instance , where is a planar graph, and an accuracy parameter , one can in randomized polynomial time compute a set of size satisfying the following condition with constant probability: there exists a set of size at most such that for every set of size at most it holds that .

A direct corollary of Theorem 1 is a fixed-parameter approximation scheme for the -Median problem in planar graphs. This continues the line of work on fixed-parameter approximation schemes for -median and -means in Euclidean spaces [10, 16], where the goal is to design an algorithm running in time for a computable function .

Theorem 2.

Given a -Median instance , where is a planar graph, and an accuracy parameter , one can in randomized time compute a solution that has connection cost at most times the minimum possible connection cost with constant probability.


Apply the algorithm of Theorem 1 and let be the obtained subset of facilities. Then run a brute-force search through all subsets of of size at most and output one with the smallest connection cost. Thus, the running time is

where the last inequality follows from the bound , which can be proved as follows: if then , and if then .

Using Theorem 1 we can also give an efficient bicriteria PTAS for -Median in planar graphs. This time, the proof is more involved and uses the local search techniques of [5].

Theorem 3.

Given a -Median instance , where is a planar graph, and an accuracy parameter , one can in randomized time compute a set of size at most such that its connection cost is at most times the minimum possible connection cost for solutions of size with constant probability.

A direct corollary of Theorem 3 is an efficient PTAS for Uniform Facility Location in planar graphs.

Theorem 4.

Given a Uniform Facility Location instance , where is a planar graph, and an accuracy parameter , one can in randomized time compute a solution that has total cost at most times the optimum cost with constant probability.


Iterate over all possible choices of being the number of facilities opened by the optimum solution, and for every invoke the algorithm of Theorem 3 for the -Median instance . From the obtained solutions output one with the smallest cost.

Note that the approach presented above fails for the non-uniform case, where each facility has its own, distinct opening cost.

In this extended abstract we focus on proving the main result, Theorem 1. The proof of Theorem 3, on which Theorem 4 also relies, is deferred to Section 4.

Our techniques.

The first step in the proof of Theorem 1 is to reduce the number of relevant clients using the coreset construction of Feldman and Langberg [9]. By applying this technique, we may assume that there are at most clients in the instance, however they are weighted: every client is assigned a nonnegative weight , and it contributes to the connection cost of any solution with times the distance to the closest open facility in the solution.

We now examine the Voronoi diagram induced in the input graph by the clients: vertices of

are classified into

cells according to the closest client. This Voronoi diagram has one cell per every client, thus it can be regarded as a planar graph with faces, where each face accommodates one cell. To formally define the Voronoi diagram, and in particular the boundaries between neighboring cells, we use the framework introduced by Marx and Pilipczuk [18] and its extension used in [20].

Consider now all the spokes in the diagram, where a spoke is the shortest path connecting the center of a cell (i.e. a client) with a branching node of the diagram incident to the cell (which is a face of ). Removing all the spokes and all the branching nodes from the plane divides it into diamonds, where each diamond is delimited by four spokes, called further the perimeter of the diamond. See Figure 1 for an example. Since the diagram is a planar graph with faces, there are diamonds altogether. Moreover, since no diamond contains a client in its interior, whenever is a path connecting a client with a facility belonging to some diamond , has to cross the perimeter of .

Now comes the key and most technical part of the proof. We very carefully put portals on the perimeter of each diamond. The idea of placement is similar to that of the resolution metric used in the QPTAS for Facility Location. Namely, on a spoke starting at client we put portals at distance from , so that the further we are on the spoke from , the sparser the portals are. As a diamond is delimited by four spokes, we may thus use only portals per diamond, while the cost of snapping a path crossing to the portal closest to the crossing point can be bounded by times the distance from the crossing point to .

For a facility in a diamond , we define the profile of as follows. For every spoke in the perimeter of , we look at the closest portal from on . We record approximate (up to multiplicative error) distances from to and neighboring portals, as well as the distance to the client endpoint of the spoke . The crux lies in the following fact: for every two facilities in with the same profile, replacing with increases the connection cost of any client connected to only by a multiplicative factor of . Hence, for every profile in every diamond it suffices to keep just one facility with that profile. Since there are diamonds and possible profiles in each of them, we keep at most facilities in total. This proves Theorem 1.

For the proof of Theorem 3, we first apply Theorem 1 to reduce the number of facilities to . Then we again inspect the Voronoi diagram, but now induced by the facilities. Having contracted every cell to a single vertex, we compute an -division of the obtained planar graph to cover it with regions of size so that only facilities are on boundaries of the regions. We open all the facilities in all the boundaries — thus exceeding the quota for open facilities by — run the PTAS of Cohen-Addad et al. [7] in each region independently, and at the end assemble regional solutions using a knapsack dynamic programming. Since within each region there are only polylogarithmically many facilities, each application of the PTAS actually works in time .

2 Preliminaries on Voronoi diagrams and coresets

In this section we recall some tools about Voronoi diagrams in planar graphs and coresets that will be used in the proof of Theorem 1. We will consider undirected graphs with positive edge lengths embedded in a sphere, with the standard shortest-paths metric for . Contrary to the previous section, the metric is defined on the vertex set of only, i.e., we do not consider as a metric space with points in the interiors of edges. For , we denote and similarly we define for and .

Recall that for a set of open facilities and a set , we define the connection cost as

If the input is additionally equipped with opening costs , then the opening cost of is defined as .

2.1 Voronoi diagrams in planar graphs

We now recall the construction of Voronoi diagrams and related notions in planar graphs used by Marx and Pilipczuk [18]. The setting is as follows. Suppose is an -vertex simple graph embedded in a sphere whose edges are assigned nonnegative real lengths. We consider the shortest path metric in : for two vertices , their distance is equal to the smallest possible total length of a path from to . We will assume that is triangulated (i.e. every face of is a cycle of length ), for this may always be achieved by triangulating the graph using edges of infinite weight.

Further, we assume that shortest paths are unique in and that finite distances between distinct vertices in are pairwise different: for all vertices with , and , we have or . This can be achieved by adding small perturbations to the edge lengths. Since we never specify degrees of polynomials in the running time of our algorithms, we may ignore the additional complexity cost incurred by the need of handling the perturbations in arithmetic operations.

Voronoi diagrams and their properties.

Suppose that is a subset of vertices111In [18] a more general setting is considered where objects inducing the diagram are connected subgraphs of instead of single vertices. We will not need this generality here. of . First, define the Voronoi partition: for a vertex , the Voronoi cell is the set of all those vertices whose distance from is smaller than the distance from any other vertex ; note that ties do not occur due to the distinctness of distances in . Note that is a partition of the vertex set of . For each , let be the union of shortest paths from vertices of to ; recall here that shortest paths in are unique. Note that, due to the distinctness of distances in , is a spanning tree of the subgraph of induced by the cell .

The diagram induced by is a multigraph constructed as follows. First, take the dual of and remove all edges dual to the edges of all the trees , for . Then, exhaustively remove vertices of degree . Finally, for every maximal -path (i.e. path with internal vertices of degree ), say with endpoints and , we replace this path by the edge ; note that this creates a loop at in case . The resulting multigraph is the Voronoi diagram induced by . Note that the vertices of are faces of ; for clarity we shall call them branching nodes. Furthermore, inherits an embedding in from the dual , where an edge that replaced a maximal -path is embedded precisely as , i.e., as the concatenation of (the embeddings of) the edges comprising . From now on we will assume this embedding of .

We recall several properties of , observed in [18]

Lemma 5 (Lemmas 4.4 and 4.5 of [18]).

The diagram is a connected and -regular multigraph embedded in , which has exactly faces, branching nodes, and edges. The faces of are in one-to-one correspondence with vertices of : each corresponds to a face of that contains all vertices of and no other vertex of .

Spokes and diamonds.

We now introduce further structural elements that can be distinguished in the Voronoi diagram, see Figure 1 for reference. The definitions and basic observations presented below are taken from Pilipczuk et al. [20], and were inspired by the Euclidean analogues due to Har-Peled [13].

An incidence is a triple where , is a branching node of the diagram , and is a vertex of that lies on (recall that is a triangular face of ) and belongs to . The spoke of the incidence , denoted , is the shortest path in between and . Note that all the vertices of belong to .

Figure 1: A part of the Voronoi diagram with various features distinguished. Branching nodes of the diagram are grayed triangular faces, edges of the diagram are dashed. Solid paths of respective colors are spokes. (The interior of) one diamond is grayed in order to highlight it.

Let be an edge of the diagram , where are branching nodes (possibly if is a loop in ). Further, let and be the vertices from that correspond to faces of incident to (possibly if is a bridge in

). Suppose for a moment that

. Then, out of the three edges of (these are edges in ) there is exactly one that crosses the edge of ; say it is the edge where and . Symmetrically, there is one edge of that crosses , say it is where and . In case , the edge crosses two different edges of and we define analogously for these two crossings; note that then, provided corresponds to the face enclosed by the loop , we have . For all , consider the incidence .

Consider removing the following subsets from the sphere : interiors of faces and spokes for all . After this removal the sphere breaks into two regions, out of which exactly one, say , intersects (the embedding of) . Let the diamond of , denoted , be the subgraph of consisting of all features (vertices and edges) embedded in . The region as above is the interior of the diamond . Note that in particular, the spokes for and the edges and belong to . The perimeter of the diamond of is the closed walk obtained by concatenating spokes for and edges in the natural order around . The following observation is immediate:

Proposition 6.

Consider removing all the spokes (considered as curves on ) and all the branching nodes (considered as interiors of faces on ) of the diagram from the sphere . Then breaks into regions that are in one-to-one correspondence with edges of : a region corresponding to the edge is the interior of the diamond . Consequently, the intersection of diamonds of two different edges of is contained in the intersection of their perimeters.

Finally, we note that the perimeter of a diamond separates it from the rest of the graph. Since vertices of are never contained in the interior of a diamond, this yields the following.

Lemma 7.

Let and be a vertex of belonging to the diamond for some edge of . Then every path in connecting and intersects the perimeter of .

2.2 Coresets

In most our algorithms, the starting point is the notion of a coreset and a corresponding result of Feldman and Langberg [9]. To this end, we need to slightly generalize the notion of a client set in a -Median instance. A client weight function is a function . Given a set of open facilities, the (weighted) connection cost is defined as

That is, every client is assigned a weight with which it contributes to the objective function. The support of a weight function is defined as . From now on, whenever we speak about a -Median instance without specified client weight function, we assume the standard function assigning each client weight .

The essence of coresets is that one can find weight functions with small support that well approximate the original instance. Given a -Median instance (without weights) and an accuracy parameter , a coreset is a weight function such that for every set of size at most , it holds that

We rely on the following result of Feldman and Langberg [9].

Theorem 8 (Theorem 15.4 of [9]).

Given a -Median instance with and accuracy parameter , one can in randomized polynomial time find a weight function with support of size that is a coreset with constant probability.

We note that Ke Chen [4] gave a construction of a strong coreset with support of size that is much simpler than the later construction of Feldman and Langberg [9]. By using this construction instead, we would obtain a weaker version of Theorem 1, with a bound on that is quadratic in instead of linear. This would be perfectly sufficient to derive an FPT approximation scheme as in Theorem 2, but for Theorem 3 we will vitally use the stronger statement. A construction of coresets with similar size guarantees, but maintainable in the streaming model, has been proposed by Braverman et al. [2].


A division of graph is a family of subgraphs of , called regions, such that every edge of is contained in exactly one region and every vertex of is contained in at least one region. For a region , the boundary of , denoted , is the set of those vertices of that are contained also in some other region . For a positive integer , a division is called an -division if every region contains at most vertices.

The following lemma, which can be traced to the work of Fredrickson [11], expresses the well-known property that planar graphs admit -divisions with small boundary. We remark that Fredrickson gave -divisions with stronger properties, but this will be the generality needed here.

Lemma 9 ([11]).

There exists a constant such that for every positive integer , every planar graph on vertices admits an -division such that

Moreover, given and such an -division can be computed in polynomial time.

PTAS for -Median of [7].

We now review the approximation scheme for -Median of Cohen-Addad et al. [7], as we will use it as a black-box. Formally, we shall need the following statement.

Theorem 10 ([7]).

Given a -Median instance where is planar, a subset of facilities with , and an accuracy parameter , it is possible in time to compute a solution with whose connection cost is at most times larger than the minimum possible connection cost of a solution that contains .

The statement of Theorem 10 somewhat differs from the one presented in [7]; let us review the differences.

First, the result of [7] works in a larger generality, when the graph is drawn from any fixed proper minor-closed class; we do not need this generality and we focus on the class of planar graphs.

Second, for the running time, the algorithm proposed by Cohen-Addad et al. [7] is actually a simple local search of radius that stops whenever it cannot find an improvement step that would decrease the cost by a multiplicative factor of at least . Observe that since in an improvement step we can add or remove only facilities from , within local search radius there are possible improvement steps, and evaluating each of them takes polynomial time. Finally, as argued in [7], the algorithm terminates within steps, so the claimed running time follows.

Third, in the statement of Theorem 10 we assume that there is a set of compulsory facilities that are required to be taken to the solution. While this is not stated in [7], it is straightforward to add this feature to the result. In the algorithm we start with as an original solution and we consider only local search steps that keep it intact. It is straightforward to see that the analysis of the approximation ratio still holds. In principle, the analysis relies on showing that if the current solution is more expensive by at least a multiplicative factor of than the optimum solution , then there is a mixed solution that is cheaper than and the symmetric difference of and has size . It then suffices to observe that if and both contain , then so does the mixed solution .

3 Facility coreset for -Median in planar graphs

In this section we give a coreset for centers for the -Median problem, that is, we prove Theorem 1. We shall focus on the following lemma, which in combination with Theorem 8 yields Theorem 1.

Lemma 11.

Given a -Median instance with a weight function and an accuracy parameter , one can in polynomial time compute a set of size satisfying the following condition with constant probability: there exists a set of size at most such that for every set of size at most it holds that .

Before we proceed, let us verify that Theorem 8 and Lemma 11 together imply Theorem 1. Given an instance of -Median, we first apply Theorem 8 to obtain a coreset with support of size . Next, we pass this coreset to Lemma 11, thus obtaining a set of size . Let be the subset of of size at most that minimizes . Then using the approximation guarantees of Theorem 8 and Lemma 11, for any we have

It remains to rescale . Hence, for the rest of this section we focus on proving Lemma 11.

Let be an input -Median instance with a weight function , where is planar. Let be an accuracy parameter and without loss of generality assume that . Let and . Without loss of generality assume that .

We assume that is embedded in a sphere and apply the necessary modifications explained in the beginning of Section 2.1 to fit into the framework of Voronoi diagrams. Denote . We compute the Voronoi partition induced by and the Voronoi diagram induced by . By Proposition 6, has vertices, faces, and edges.

Distance levels.

We first compute an -approximate solution using the algorithm given by Feldman and Langberg [9, Theorem 15.1]; this algorithm outputs an -approximate solution with constant probability. Let us scale all the edge lengths in by the same ratio so that


Next, we assign length to every edge of length larger than ; clearly, they are not used in the computation of the connection cost of an optimum solution. Without loss of generality we assume that all the distances between vertices in are finite: otherwise we can split the instance into a number of independent ones, compute a suitable set for each of them and take the union.

The next step is to assign levels to distances in the graph. For any , define the level of , denoted , to be the smallest nonnegative integer such that . Note that if and only if . Let , then we have

Observe that since , by (1) we have


Portals and profiles.

Let be an incidence in . Let and let ; note that has length exactly . For every integer , we define the portal as a vertex on at distance exactly from ; we subdivide an edge an create a new vertex to accommodate if necessary. Furthermore, we add also a portal . Since , there are at most portals on the spoke .

Consider a diamond induced by some edge of , and a vertex in . Recall that the perimeter of consists of spokes for four incidence , where . The profile of a vertex belonging to the diamond consists of the following information, for all :

  1. The minimum index satisfying

    where is the vertex of involved in . If no such index exists, we set .

  2. Letting

    the profile records the value of for all .

Whenever speaking about a vertex and incidence , we use and to denote and as above. We note that in total there are only few possible profiles.

Claim 1.

The number of possible different profiles of vertices in is .


Since for every incidence , there are at most choices for the four values for . Further, we have , so there are at most choices for the values for and .

For future reference, we state the key property of profiles: having the same profile implies having approximately same distances to the profiles with indices in .

Claim 2.

Suppose and are two vertices of that have the same profile. Then for each and , we have


Let , as recorded in the common profile. If , then and we are done. Otherwise, and are both contained in the interval . This interval has length , hence the claim follows.

Construction of the set .

We now construct the set as follows: for every diamond and every possible profile in , include in one facility with that profile (if one exists). Since there are diamonds, by Claim 1 and (1) we have

as claimed. It remains to prove that has the claimed approximation properties.

For every facility , pick a diamond containing and let to be the facility that has the same profile as . Fix a solution with minimizing . Let . Clearly, . To finish the proof of Lemma 11 it suffices to show that


To this end, consider any client and let be the facility in serving , that is, . To show (3), it suffices to prove that


Indeed, by summing (4) through all and using (1) we obtain

where the last inequality is due to being an -approximate solution.

Hence, from now on we focus on proving (4). Let be the diamond containing and . Consider the shortest path from to in . By Lemma 7, the path intersects the perimeter of the diamond . Let be the vertex on the perimeter of that lies on and, among such, is closest to on . Since is a shortest path, the length of the subpaths of between and and between and equal and , respectively, and in particular .

We now observe that to prove (4), it suffices show the following.


Indeed, assuming (5) we have

Hence, from now on we focus on proving (5).

Let for , be the four incidences involved in the diamond . Since lies on the perimeter of , actually lies on , where for some . Let be the vertex of involved in the incidence . Since while , we have that . Consequently, we have and , so to prove (5) it suffices to prove the following:


Let be the portal on the subpath of between and that is closest to . Intuitively, is a good approximation of and distances from are almost the same as distances from . As this idea will be repeatedly used in this sequel, we encapsulate it in a single claim.

Claim 3.

Suppose for some vertices and we have

for some . Then


By the choice of we have

so . Therefore, we have

as claimed.

Since and have the same profile, we may denote and . Further, let We now consider a number of cases depending on the relative values of and , with the goal on proving that (6) holds in each case. See Figure 2 for an illustration.

Figure 2: The diamond with vertices , , , , and . Red vertices are clients, black squares are portals. The case distinction in the proof corresponds to relative order of and .

Middle case: .

As profiles of and are the same and , by Claim 2 we have

It suffices now to apply Claim 3 to infer inequality (6).

Close case: .

Since , by the choice of we have

By applying Claim 3, we infer that