In the (uncapacitated) facility location problem, we are given a metric space , where is the set of facility locations, is the set of clients, and is a distance function, which is non-negative, symmetric and satisfies triangle inequalities. For each location , there is a facility opening cost . The goal is open a subset of facilities so as to minimize cost of opening the facilities and the connection cost. The cost of connecting a client to an open facility is equal to . Hence, the objective function can be expressed concisely as , where for a set , is the total facility cost of and denotes the distance of to the nearest location in . The facility location problem arises in countless applications: in the placement of servers in data centers, network design, wireless networking, data clustering, location analysis for placement of fire stations, medical centers, and so on. Hence, the problem has been studied extensively in many different communities: approximation algorithms, operations research, and computational geometry. In the approximation algorithms literature in particular, the problem occupies a prominent position as the development of every major technique in the field is tied to its application on the facility location problem. See the text book by Williamson and Shmoys [Williamson] for more details. The problem is hard to approximate to a factor better than 1.463 [Guha1998]. The current best-known polynomial-time algorithm is given by the third author, and achieves 1.488-approximation [Li13].
In many real-world applications the set of clients arrive online, the metric space can change over time, and there can be memory constraints: This has motivated the problem to be studied in various models: online [Meyerson2001, Fotakis08algorithmica, Anagnostopoulos2004, Fotakis2007], dynamic [Cohen-Addad19, Goranci19, CyganCMS18, Wesolowsky:1973, Farahani2009, Eisenstat, AnNS15], incremental [Fotakis06, Charikar1997, Fotakis2011], streaming [Indyk2004, Fotakis11, Lammersen, Czumaj2013, Charikar1997], game theoretic [Vetta2002, FotakisT13, FotakisT13a], to name a few. This paper is concerned with online and dynamic models. Thus to keep the flow of presentation linear, we restrict ourselves to the results in these two models here.
Motivated by its applications in network design and data clustering, Meyerson [Meyerson2001] initiated the study of facility location problem in the online setting. Here, clients arrive online one-by-one, the algorithm has to assign the newly arriving client to an already opened facility or needs to open a new facility to serve the request. The decisions made by the algorithm are irrevocable, in the sense that a facility that is opened cannot be closed and the clients cannot be reassigned. In the online setting, Meyerson [Meyerson2001] designed a very elegant randomized algorithm that achieves an competitive ratio, and also showed that no online algorithm can obtain competitive ratio. This result was later extended by Fotakis [Fotakis08algorithmica] to obtain an asymptotically optimal -competitive algorithm. Both the algorithms and analysis techniques in [Fotakis08algorithmica, Meyerson2001] were influential, and found many applications in other models such as streaming [Fotakis2011]. The lowerbound in Fotakis [Fotakis08algorithmica] holds even in very special metric spaces such as HSTs or the real line. Since then, several online algorithms have been designed achieving the same competitive ratio with more desirable properties such as deterministic [Anagnostopoulos2004], primal-dual [Fotakis2007], or having a small memory footprint [Fotakis11]. We refer to a beautifully written survey by Fotakis [Fotakis2011] for more details.
The main reason to assume that decisions made by an algorithm are irrevocable is because the cost of changing the solution is expensive in some applications. However, if one examines these above applications closely, say for example connecting clients to servers in data centers, it is more natural to assume that decisions need not be irrevocable but the algorithm should not change the solution too much. This is even more true in modern data centers where topologies can be reconfigured; see [GhobadiMPDKRBRG16] for more details. A standard way of quantifying the restriction that an online algorithm does not make too many changes is using the notion of recourse. The recourse per step of an online algorithm is the number of changes it makes to the solution. Recourse captures the minimal amount of changes an online algorithm has to make to maintain a desired competitive ratio due to the information theoretic limits. For the facility location problem, depending on the application, the recourse can correspond to: 1) the number of changes made to the opened facilities (called facility recourse) 2) the number of reconnections made to the clients (called client recourse). Notice that we can assume for every facility we open/close, we have to connect/disconnect at least one client. Thus the client recourse is at least the facility recourse. In the clustering applications arising in massive data sets, the opened facilities represent cluster centers, which represent summaries of data. Here one is interested in making sure that summaries do not change too frequently as more documents are added online. Therefore, facility recourse is a good approximation to the actual cost of changing the solution [Charikar1997, Fotakis06]. On the other hand, in network design problems, client recourse is the true indicator of the cost to implement the changes in the solution. As a concrete example, consider the problem of connecting clients to servers in datacenters, which was one of the main motivation for Meyerson [Meyerson2001] to initiate the study of online facility location problem. Here, it is important that one does not reconnect clients to servers too many times, as such changes can incur significant costs both in terms of disruption of service and the labor cost. Consider another scenario where a retailing company tries to maintain stores to serve the dynamically changing set of clients. As the clients are changing so frequently, it would be infeasible to build/shutdown even one store for every new client. In this application, small client recourse per step is desirable, as that will automatically forbid frequent changes of store locations.
In this light, a natural question that arises is:
Is it possible to maintain a constant approximation for the facility location problem if we require that the facility and client recourse is small?
Our first main result shows that indeed this is possible. In the following theorems, we use to denote the total number of facility locations and all clients that ever arrived, and to denote the diameter of the metric (assuming all distances are integers).
There is a deterministic online algorithm for the facility location problem that achieves a competitive ratio of with amortized facility and client recourse against an adaptive adversary.
Our algorithm to show the above theorem differs from the previous approaches used in the context of online variants of facility location problem, and is based on local search. The local search algorithm is one of the most widely used algorithms for the facility location problem in practice and is known to achieve an approximation factor of in the offline setting. See the influential paper by Arya et al [AryaGKMP01] and a survey by Munagala [Munagala16]. Thus our result matches the best known approximation ratio for offline facility location using local search. Further, our result shows that the local search algorithm augmented with some small modifications is inherently stable as it does not make too many changes to the solutions even if clients are added in an online fashion. This gives further justification for its popularity among practitioners.
Prior to Theorem 1, the known results [Fotakis06, Diveki2011, Fotakis11] needed one or more of these assumptions: 1) the facility costs are the same 2) we are interested in knowing only the cost of solution 3) we are interested only in bounding the facility recourse. In particular, there was no known algorithm that bounds the client recourse, which is an important consideration in many applications mentioned above. Moreover, our algorithm also achieves a better approximation factor; previously best known algorithm for the facility location problem achieved a competitive ratio of 48 [Fotakis2011].
Our result in the recourse setting for the facility location problem should be contrasted with the similar results shown recently for online Steiner tree [Gupta015], set cover [GuptaK0P17], scheduling [GuptaKS14], and matchings and flows [BernsteinHR19, GuptaKS14]. Moreover, these results also raise an intriguing questions: is polylog amount of recourse enough to beat information theoretic lowerbounds in the online algorithms? Is recourse as or more powerful than randomization?
While having a small client recourse is enough in data center applications, it is not enough in some others. Take wireless networks as a concrete example. Here, the set of clients (mobile devices) keeps changing over time, and it is necessary to update the assignment of clients to facilities as quickly as possible so to minimize the service disruption. These applications motivated Cygan et al [CyganCMS18], Goranci et al [Goranci19] and Cohen-Addad et al [Cohen-Addad19] to study the facility location problem in the framework of dynamic algorithms. The dynamic model of [CyganCMS18] and [Cohen-Addad19] is different from what we study here, so we discuss it at end of this section.
The dynamic facility location problem is similar to the one in online setting except that at each time step either a new client arrives or an existing client departs. The goal is to always maintain a solution that is a constant factor approximation to the optimal solution, while minimizing the total time spent in updating the solution. We emphasize that we require our dynamic algorithms to maintain an actual assignment of clients to facilities
, not just the set of open facilities and an estimate of connection cost. This is important for applications mentioned above. This setting was considered in[Goranci19], who showed that for metric spaces with doubling dimension , there is a deterministic fully dynamic algorithm with update time, which maintains a constant approximation. However, for more general metric spaces no results were known in the dynamic setting, and we give the first results. First we consider the incremental setting, where clients only arrive and never depart.
In the incremental setting against an adaptive adversary, there is a randomized dynamic algorithm for the facility location problem that, with probability at least
In the incremental setting against an adaptive adversary, there is a randomized dynamic algorithm for the facility location problem that, with probability at least, maintains an approximation factor of and has total update time of .
Note that it takes space to specify the input in our model (see Section 2.2). Hence the running time of our algorithms is almost optimal up to polylog factors when . The proof of above theorem uses randomized local search and builds on our result in the recourse setting. We use randomization to convert the recourse bound into an update time bound. Further, our analysis of above theorem also implies one can obtain running time by losing factors in the approximation ratio; see the remark at the end of Section 5.
Next we study the fully dynamic setting. Here, we first consider an important class of metric spaces called hierarchically well separated tree (HST) metrics [Bartal96]; see Definition 5 for the formal definition, and Section 2.2 for more details about how the input sequence is given. For HST metric spaces, we show the following result.
In the fully dynamic setting against adaptive adversaries, there is a deterministic algorithm for the facility location problem that achieves an approximation factor with preprocessing time and total update time for the HST metric spaces.
A seminal result by Bartal [Bartal96], which was later tightened by Fakcharoenphol, Rao and Talwar [Fakcharoenphol2003], shows that any arbitrary -point metric space can be embedded into a distribution over HSTs such that the expected distortion is at most , which is also tight. Moreover, such a probabilistic embedding can also be computed in time; see recent results by Blelloch, Gu and Sun for details [Blelloch0S17]. These results immediately imply the following theorem, provided the input is specified as in Section 2.2.
In the fully dynamic setting against oblivious adversary, there is a randomized algorithm for the facility location problem that maintains an approximation factor of with preprocessing time of and total update time. The approximation guarantee holds only in expectation for every time step of the algorithm.
Observe that unlike the incremental setting, the above theorem holds only in the oblivious adversary model, as probabilistic embedding techniques preserve distances only in expectation as can be seen by taking a cycle on points. Our result also shows that probabilistic tree embeddings using HSTs can be a very useful technique in the design of dynamic algorithms, similar to its role in online algorithms [Bartal96, BartalBBT97, Umboh15, BubeckCLLM18].
Our algorithms in Theorems 3 and 4 in the fully dynamic setting also have the nice property that amortized client and facility recourse is (in fact, we can achieve a slight better bound of as can be seen from the analysis). This holds as our dynamic algorithms maintain the entire assignment of clients to facilities explicitly in memory at every time step. Thus, the amortized client reconnections is at most the amortized update time. This is useful when one considers an online setting where clients arrive and depart, and is interested in small client recourse. A fully dynamic online model of facility location problem, where clients arrive and depart was recently studied by Cygan et al [CyganCMS18] and Cohen-Addad et al [Cohen-Addad19], but with different assumption on recourse. In this model, when a client arrives, the algorithm has to assign it to an open facility immediately; While upon departure of a client, if a facility was opened at the same location, then the clients that were assigned to that location should be reassigned immediately and irrevocably. Cygan et al [CyganCMS18] studied the case when recourse is not allowed: they showed that a delicate extension of Meyerson’s [Meyerson2001] algorithm obtains asymptotically tight competitive ratio of . Cohen-Addad et al [Cohen-Addad19] later showed that this can be improved to if recourse is allowed. However, both results holds only for the uniform facility costs and Cygan et al[CyganCMS18] even showed an unbounded lower bound for the non-uniform facility cost case in their model. Moreover, in their model reconnections of clients are assumed to be “automatic” and do not count towards the client recourse; it is not clear how many client reconnections their algorithm will make.
1.1 Our Techniques
Our main algorithmic technique for proving Theorems 1 and 2 is local search, which is one of the powerful algorithm design paradigms. Indeed, for both results, the competitive (approximation) ratio we achieve is , which matches the best approximation ratio for offline facility location obtained using local search [AryaGKMP01]. Both of our results are based on the following key lemma. Suppose we maintain local optimum solutions at every time step in our algorithm. When a new client comes at time , we add it to our solution using a simple operation, and let be the increase of our cost due to the arrival of . The key lemma states that the sum of values in the first time steps can be bounded in terms the optimum cost at time . With a simple modification to the local search algorithm, in which we require each local operation decreases enough cost for every client it reconnects, one can bound the total client recourse.
The straightforward way to implement the local search algorithm takes time . To derive a better running time, we leverage the randomized local search idea of Charikar and Guha [CharikarGhua2005]. At every iteration, we randomly choose a facility or a closing operation, and then perform the best operation that opens or swaps in , or closes a facility if that is what we choose. By restricting the facility and with the help of the heap data structure, an iteration of the algorithm can be implemented in time . As in [CharikarGhua2005] we can also show that each iteration can make a reasonable progress in expectation, leading to a bound of on the number of iterations for the success of the algorithm with high probability. We remark that the algorithm in [CharikarGhua2005] used a different local search framework. Therefore, our result shows that the classic algorithm of [AryaGKMP01] can also be made fast.
However, directly replacing the randomized local search procedure with a deterministic one does not work: The solution at the end of each time might not be a local optimum as we did not enumerate all possible local operations. Thus the key lemma does not hold any more. Nevertheless we show that applying a few local operations around upon its arrival can address the issue. With the key lemma, one can bound the number of times we perform the iterative randomized local search procedure, and thus the overall running time.
Our proof for Theorem 3 is based on a generalization of the greedy algorithm for facility location on HST metrics, which was developed in [EsencayiGLW19] in the context of differential privacy but only for the case of uniform facility cost. The intuition of the algorithm is as follows: If for some vertex of the HST , the number of clients in the tree (the sub-tree of rooted at ) times the length of parent edge of is big compared to the cost of the cheapest facility in , then we should open that facility. Otherwise, we should not open it and let the clients in be connected to outside through the parent edge. This intuition can be made formal: We mark in the former case; then simply opening the cheapest facility in for all lowest marked vertices leads to a constant approximation for facility location.
The above offline algorithm leads to a dynamic data structure that maintains -approximate solutions, supports insertion and deletion of clients, and reports the connecting facility of a client in
time. This is the case since each time a client arrives or departs, only its ancestors will be affected. However, in a dynamic algorithm setting, we need to maintain the assignment vector in memory, so that when the connecting facility of a client changes, it needs to be notified. This requires that the number of reconnections made by our algorithm to be small. To achieve the goal, we impose two constants for eachwhen deciding whether should be marked and the cheapest facility in should be open. When a vertex changes its marking/opening status, we update the constants in such a way that it becomes hard for the status to be changed back.
Throughout the paper, we use to denote the set of potential facilities for all the problems and models; we assume is given upfront. is the dynamic set of clients we need to connect by our algorithm. This is not necessarily the set of clients that are present: In the algorithms for online facility location with recourse and dynamic facility location in the incremental setting, we fix the connections of some clients as the algorithms proceed. These clients are said to be “frozen” and excluded from . We shall always use to denote the hosting metric containing and all potential clients. For any point and subset of points in the metric, we define to be the minimum distance from to a point in . We assume all distances are integers, the minimum non-zero distance between two points is 1. We define , the diameter or the aspect ratio of a metric space, as the largest distance between two points in it. Let be plus the total number of clients arrived during the whole process. The algorithms do not need to know the exact value of in advance, except that in the dynamic algorithm for facility location in the incremental setting (the problem in Theorem 2), to achieve the success probability, a sufficiently large needs to be given.111For an algorithm that might fail, we need to have some information about to obtain a failure probability that depends on .
In all the algorithms, we maintain a set of open facilities, and a connection of clients in to facilities in . We do not require that connects clients to their respective nearest open facilities. For any solution , we use to denote the connection cost of the solution. For facility location, we use to denote the total cost of the solution , where . Notice that and the definitions of and functions depend on the dynamic set .
Throughout the paper, we distinguish between a “moment”, a “time” and a “step”. A moment refers to a specific time point during the execution of our algorithm. A time corresponds to an arrival or a departure event: At each time, exactly one client arrives or departs, and timerefers to the period from the moment the -th event happens until the moment the -th event happens (or the end of the algorithm). One step refers to one statement in our pseudo-codes indexed by a number.
2.1 Hierarchically Well Separated Trees
A hierarchically-well-separated tree (or HST for short) is an edge-weighted rooted tree with the following properties:
all the root-to-leaf paths have the same number of edges,
if we define the level of vertex , , to be the number of edges in a path from to any of its leaf descendant, then for an non-root vertex , the weight of the edge between and its parent is exactly .
Given a HST with the set of leaves being , we use to denote the shortest path metric of the tree (with respect to the edge weights) restricted to .
The classic results by Bartal [Bartal96] and Fakcharoenphol, Rao and Talwar [Fakcharoenphol2003] state that we can embed any -point metric (with minimum non-zero distance being ) to a distribution of expanding222A metric is expanding w.r.t if for every , we have . HST metrics with distortion : For every , we have and . Moreover, there is an efficient randomized algorithm [Blelloch0S17] that outputs a sample of the tree from . Thus applying standard arguments, Theorem 3 implies Theorem 4.
2.2 Specifying Input Sequence
In this section we specify how the input sequence is given. For the online and dynamic facility location problem, we assume the facility locations , their costs , and the metric restricted to are given upfront, and they take space. Whenever a client arrives, it specifies its distance to every facility (notice that the connection cost of an assignment does not depend on distances between two clients and thus they do not need to be given). Thus the whole input contains words.
For Theorems 3 and 4, as we do not try to optimize the constants, we do not need that a client specifies its distance to every facility. By losing a multiplicative factor of and an additive factor of in the approximation ratio, we can assume that every client is collocated with its nearest facility in (See Appendix C). Thus, we only require that when a client comes, it reports the position of its nearest facility. For Theorem 3, the HST over is given at the beginning using words. For Theorem 4, the metric over is given at the beginning using words. Then, we use an efficient algorithm [Blelloch0S17] to sample a HST .
2.3 Local Search for facility location
The local-search technique has been used to obtain the classic -approximation offline algorithm for facility location [AryaGKMP01]. We now give an overview of the algorithm, which will be the baseline of our online and dynamic algorithms for facility location. One can obtain a (tight) -approximation for facility location without scaling facility costs. Scaling the facility costs by a factor of when deciding whether an operation can decrease the cost, we can achieve a better approximation ratio of . Throughout, we fix the constants and . For a solution to a facility location instance, we use to denote the cost of the solution with facility costs scaled by . We call the scaled cost of .
Given the current solution for a facility location instance defined by and , we can apply a local operation that changes the solution . A valid local operation is one of the following.
An operation, in which we open some facility and reconnect a subset of clients to . We allow to be already in , in which case we simply reconnect to . This needs to be allowed since our does not connect clients to their nearest open facilities.
A operation, we close some facility and reconnect the clients in to facilities in .
In a operation, we open some facility and close some facility , reconnect the clients in to facilities in , and possibly some other clients to . We say is swapped in and is swapped out by the operation.
Thus, in any valid operation, we can open and/or close at most one facility. A client can be reconnected if it is currently connected to the facility that will be closed, or it will be connected to the new open facility. After we apply a local operation, and will be updated accordingly so that is always the current solution.
For the online algorithm with recourse model, since we need to bound the number of reconnections, we apply a local operation only if the scaled cost it decreases is large compared to the number of reconnections it makes. This motivates the following definition:
Definition 6 (Efficient operations for facility location).
Given a , we say a local operation on a solution for a facility location instance is -efficient, if it decreases by more than times the number of clients it reconnects.
The following two theorems can be derived from the analysis for the local search algorithms for facility location. We include their proofs in Appendix A for completeness.
Consider a facility location instance with cost of the optimum solution being (using the original cost function). Let be the current solution in our algorithm and be a real number. If there are no -efficient local operations on , then we have
In particular, if we apply the theorem with , then we obtain that is a -approximation for the instance.
The following theorem will be used to analyze our randomized local search procedure.
Let be a solution to a facility location instance and be the optimum cost. Then there are two sets and of valid local operations on , where each operation decreases the scaled cost by , such that the following holds:
There are at most operations in .
For every , there is at most 1 operation in each of and that opens or swaps in .
2.4 Useful Lemmas
The following lemmas will be used repeatedly in our analysis and thus we prove them separately in Appendix B.
Let for some integer . Let for every . Let be a sequence of real numbers and such that for every . Then we have
Assume at some moment of an algorithm for facility location, is the set of clients, is the solution for . Let and be any non-empty set of clients. Also at the moment there are no -efficient operation that opens for some . Then we have
The rest of the paper is organized as follows. In Section 3, we prove Theorem 1 by giving our online algorithm for facility location with recourse. Section 4 gives the randomized local search procedure, that will be used in the proof of Theorem 2 in Section 5. Section 6 is dedicated to the proof of Theorem 4, by giving the fully dynamic algorithm for facility location in HST metrics. We give some open problems and future directions in Section 7. Some proofs are deferred to the appendix for a better flow of the paper.
3 -Competitive Online Algorithm with Recourse
In this section, we prove Theorem 1 by giving the algorithm for online facility location with recourse.
3.1 The Algorithm
For any , let be a parameter that is sufficiently small so that the approximation ratio achieved by our algorithm is at most . Our algorithm for online facility location is easy to describe. Whenever the client comes at time , we use a simple rule to connect , as defined in the procedure in Algorithm 1: either connecting to the nearest facility in , or opening and connecting to its nearest facility in , whichever incurs the smaller cost. Then we repeatedly perform -efficient operations (Definition 6), until no such operations can be found, for . 333There are exponential number of possible operations, but we can check if there is a -efficient one efficiently. operations can be handled easily. To check if we can open a facility , it suffices to check if . operations are more complicated but can be handled similarly.
We can show that the algorithm gives an -approximation with amortized recourse ; recall that is the aspect ratio of the metric. To remove the dependence on , we divide the algorithm into stages, and freeze the connections of clients that arrived in early stages. The final algorithm is described in Algorithm 3, and Algorithm 2 gives one stage of the algorithm.
In Algorithm 2, we do as described above, with two modifications. First, we are given an initial set of clients and a solution for which is -approximate. Second, the stage will terminate if the cost of our solution increases by a factor of more than . The main algorithm (Algorithm 3) is broken into many stages. Since we shall focus on one stage of the algorithm for most part of our analysis, we simply redefine the time so that every stage starts with time 1. The improved recourse comes from the freezing operation: at the end of each stage, we permanently open one copy of each facility in , and permanently connect clients in to copies of according to , where and are the client set and solution at the beginning of the stage. Notice that we assume the original facilities in will still participate in the algorithm in the future; that is, they are subject to opening and closing. Thus each facility may be opened multiple times during the algorithm and we take the facility costs of all copies into consideration. This assumption is only for the sake of analysis; the actual algorithm only needs to open one copy and the costs can only be smaller compared to the described algorithm.
From now on, we focus on one stage of the algorithm and assume that the solution given at the beginning of each stage is -approximate. In the end we shall account for the loss due to the freezing of clients and facilities. Within a stage, the approximation ratio follows directly from Theorem 7: Focus on the moment after the while loop at time step in Algorithm 2. Since there are no -efficient local operations on , we have by the theorem that , where is the cost of the optimum solution for . Thus, at the end of each time, we have .
3.2 Bounding Amortized Recourse in One Stage
We then bound the amortized recourse in a stage; we assume that at the beginning of the stage since otherwise there will be no recourse involved in the stage (since we terminate the stage when the cost becomes non-zero). We use to denote the last time of the stage. For every time , let be the set at the end of time , and to be the cost of the optimum solution for the set . For every , we define to be the value of after Step 5 at time step in Algorithm 2, minus that before Step 5. We can think of this as the cost increase due to the arrival of .
The key lemma we can prove is the following:
For every , we have
Consider the optimum solution for and focus on any star in the solution; that is, is an open facility and is the set of clients connected to . Assume , where ; recall that is the initial set of clients given at the beginning of the stage. We shall bound in terms of the cost of the star .
By the rule specified in , we have . Now focus on any integer . Before Step 5 at time , no -efficient operation that opens is available. Thus, we can apply Lemma 10 on , and to conclude that before Step 5, we have
In , we have the option of connecting to its nearest open facility. Thus, we have
We now sum up the above inequality for all and that . We get
To see the above inequality, it suffices to consider the coefficients for and ’s on the right-hand side. The coefficient for is at most ; the coefficient for each is .
We now take the sum of (1) over all stars in the optimum solution for . The sum for the first term on the right side of (1) will be since is exactly the cost of the star . The sum for the second term will be since the set of integers overall stars and all are all positive and distinct. Thus overall, we have . ∎
With Lemma 11, we can now bound the amortized recourse of one stage. In time , first increases by in Step 5. Then after that, it decreases by at least for every reconnection we made. Let ; Lemma 11 says for some and every . Noticing that is a non-decreasing sequence, the total number of reconnections is at most
Notice that . Applying Lemma 9 with replaced by , and for every , we have that , since we have . Notice that since . So, the total number of reconnections is at most . The amortized recourse per client is , where in the amortization, we only considered clients involved in the stage. Recall that is the total number of clients arrived.
As each client appears in at most 2 stages, the overall amortized recourse is . Finally we consider the loss in the approximation ratio due to freezing of clients. Suppose we are in the -th stage. Then the clients arrived at and before -th stage has been frozen and removed. Let be the cost of the optimum solution for all clients arrived at or before -th stage. Then the frozen facilities and clients have cost at most . In any time in the -th stage, the optimum solution taking all arrived clients into consideration has cost , and our solution has cost at most without considering the frozen clients and facilities. Thus, our solution still has approximation ratio when taking the frozen clients into consideration.
4 Fast Local Search via Randomized Sampling
From now on, we will be concerned with dynamic algorithms. Towards proving Theorem 2 for the incremental setting, we first develop a randomized procedure that allows us to perform local search operations fast. In the next section, we use this procedure and ideas from the previous section to develop the dynamic algorithm with the fast update time.
The high level idea is as follows: We partition the set of local operations into many “categories” depending on which facility it tries to open or swap in. In each iteration of the procedure, we sample the category according to some distribution and find the best local operation in this category. By only focusing on one category, one iteration of the procedure can run in time . On the other hand, the categories and the distribution over them are designed in such a way that in each iteration, the cost of our solution will be decreased by a multiplicative factor of . This idea has been used in [CharikarGhua2005] to obtain their algorithm for approximating facility location. However, their algorithm was based on a different local search algorithm and analysis; for consistency and convenience of description, we stick to original local search algorithm of [AryaGKMP01] that leads to -approximation for the problem. Our algorithm needs to use the heap data structure.
4.1 Maintaining Heaps for Clients
Unlike the online algorithm for facility location in Section 3, in the dynamic algorithm, we guarantee that the clients are connected to their nearest open facilities. That is, we always have ; we still keep for convenience of description. We maintain min-heaps, one for each client : The min-heap for will contain the facilities in , with priority value of being . This allows us to efficiently retrieve the second nearest open facility to each : This is the facility at the top of the heap for and we use the procedure to return it.
We define four simple procedures and that are described in Algorithms 4, 5, 6 and 7 respectively. Recall that we use the scaled cost for the local search algorithm; so we are working on the scaled cost function in all these procedures. for any returns , the increment of the scaled cost that will be incurred by opening . (For it to be useful, should be negative, in which case indicates the cost decrement of opening ). This is just one line procedure as in Algorithm 4; will open if it can reduce the scaled cost. for some returns a pair , where is the smallest scaled cost increment we can achieve by opening and closing some facility , and gives the facility achieving the smallest value. (Again, for to be useful, it should be negative, in which case is the facility that gives the maximum scaled cost decrement .) Similarly, returns a pair , which tells us the maximum scaled cost decrement we can achieve by closing one facility and which facility can achieve the decrement. Notice that in all the procedures, the facility we shall open or swap in is given as a parameter, while the facility we shall close is chosen and returned by the procedures.
With the heaps, the procedures and can run in time. We only analyze as the other two are easier. First, we define to be the set of clients with ; these are the clients that will surely be reconnected to once is swapped in. Let be the net scaled cost increase by opening and connecting to . The computation of and in Step 1 takes time. If additionally we close some , we need to reconnect each client in to either , or the top element in the heap for , whichever is closer to . Steps 2 and 3 compute and return the best scaled cost increment and the best . Since , the running time of the step can be bounded by .
The running time for , swapping two facilities and closing a facility (which are not defined explicitly as procedures, but used in Algorithms 8) can be bounded by . The running times come from updating the heap structures: For each of the heaps, we need to delete and/or add at most elements; each operation takes time .
4.2 Random Sampling of Local Operations
With the support of the heaps, we can design a fast algorithm to implement randomized local search. in Algorithm 8 gives one iteration of the local search. We first decide which operation we shall perform randomly. With probability , we perform the operation that will reduce the scaled cost the most (if it exists). With the remaining probability , we perform either an or a operation. To reduce the running time, we randomly choose a facility and find the best operation that opens or swaps in , and perform the operation if it reduces the cost. One iteration of calls the procedures in Algorithms 4 to 7 at most once and performs at most one operation, and thus has running time .
In the procedure described in Algorithm 9, we run the times. It returns the best solution obtained in these iterations, according to the original (non-scaled) cost, which is not necessarily the solution given in the last iteration. So we have
The running time of is , where is the set of clients when we run the procedure.
Throughout this section, we fix a facility location instance. Let be the optimum solution (w.r.t the original cost) and be the optimum cost. Fixing one execution of , we use and to denote the solutions before and after the execution respectively. Then, we have
Consider an execution of and fix . We have