Facility Location is a classical problem that has been widely studied in both combinatorial optimization and operations research, due to its many practical applications. It provides a simple and natural model for industrial planning, network design, machine learning, data clustering and computer visionDrezner and Hamacher , Lazic , Caragiannis et al. , Betzler et al. . In its basic form, -Facility Location instances are defined by the locations of agents in a metric space. The goal is to find facility locations so as to minimize the sum of distances of the agents to their nearest facility.
In many natural location and network design settings, agent locations are not known in advance. Motivated by this fact, Meyerson  introduced online facility location problems, where agents arrive one-by-one and must be irrevocably assigned to a facility upon arrival. Moreover, the fast increasing volume of available data and the requirement for responsive services has led to new, online clustering algorithms Liberty et al. , balancing the quality of the clusters with their rate of change over time. In practical settings related to online data clustering, new data points arrive, and the decision of clustering some data points together should not be regarded as irrevocable (see e.g., Fotakis  and the references therein).
More recently, understanding the dynamics of temporally evolving social or infrastructure networks has been the central question in many applied areas such as viral marketing, urban planning etc. Dynamic facility location proposed by Eisenstat et al.  has been a new tool to analyze temporal aspects of such networks. In this time dependent variant of facility location, agents may change their location over time and we look for the best tradeoff between the optimal connections of agents to facilities and the stability of solutions between consecutive timesteps. The stability of the solutions is modeled by introducing an additional moving cost (or switching cost), which has a different definition depending on the particular setting.
Model and Motivation. In this work, we study the multistage -facility reallocation problem on the real line, introduced by de Keijzer and Wojtczak . In -facility reallocation, facilities are initially located at on the real line. Facilities are meant to serve agents for the next days. At each day, each agent connects to the facility closest to its location and incurs a connection cost equal to this distance. The locations of the agents may change every day, thus we have to move facilities accordingly in order to reduce the connection cost. Naturally, moving a facility is not for free, but comes with the price of the distance that the facility was moved. Our goal is to specify the exact positions of the facilities at each day so that the total connection cost plus the total moving cost is minimized over all days. In the online version of the problem, the positions of the agents at each stage are revealed only after determining the locations of the facilities at stage .
For a motivating example, consider a company willing to advertise its products. To this end, it organizes advertising campaigns at different locations of a large city for the next
days. Based on planned events, weather forecasts, etc., the company estimates a population distribution over the locations of the city for each day. Then, the company decides to compute the best possible campaign reallocation withcampaigns over all days (see also  for more examples).
de Keijzer and Wojtczak  fully characterized the optimal offline and online algorithms for the special case of a single facility and presented a dynamic programming algorithm for facilities with running time exponential in . Despite the practical significance and the interesting theoretical properties of -facility reallocation, its computational complexity and its competitive ratio (for the online variant) are hardly understood.
Contribution. In this work, we resolve the computational complexity of -facility reallocation on the real line and take a first step towards a full understanding of the competitive ratio for the online variant. More specifically, in Section 3, we present an optimal algorithm with running time polynomial in the combinatorial parameters of -facility reallocation (i.e., , and ). This substantially improves on the complexity of the algorithm, presented in , that is exponential in
. Our algorithm solves a Linear Programming relaxation and thenrounds the fractional solution to determine the positions of the facilities. The main technical contribution is showing that a simple rounding scheme yields an integral solution that has the exact same cost as the fractional one.
Our second main result concerns the online version of the problem with facilities. We start with the observation that online -facility reallocation problem with facilities is a natural and interesting generalization of the classical -server problem, which has been a driving force in the development of online algorithms for decades. The key difference is that, in the -server problem, there is a single agent that changes her location at each stage and a single facility has to be relocated to this new location at each stage. Therefore, the total connection cost is by definition , and we seek to minimize the total moving cost.
From a technical viewpoint, the -facility reallocation problem poses a new challenge, since it is much harder to track the movements of the optimal algorithm as the agents keep coming. It is not evident at all whether techniques and ideas from the -server problem can be applied to the -facility reallocation problem, especially for more general metric spaces. As a first step towards this direction, we design a constant-competitive algorithm, when . Our algorithm appears in Section 4 and is inspired by the double coverage algorithm proposed for the -server problem Koutsoupias .
Related Work. We can cast the -facility reallocation problem as a clustering problem on a temporally evolving metric. From this point of view, -facility reallocation problem is a dynamic -median problem. A closely related problem is the dynamic facility location problem, Eisenstat et al. , An et al. . Other examples in this setting are the dynamic sum radii clustering Blanchard and Schabanel  and multi-stage optimization problems on matroids and graphs Gupta et al. .
In Friggstad and Salavatipour , a mobile facility location problem was introduced, which can be seen as a one stage version of our problem. They showed that even this version of the problem is -hard in general metric spaces using an approximation preserving reduction to -median problem.
Online facility location problems and variants have been extensively studied in the literature, see Fotakis  for a survey. Divéki and Imreh  studied an online model, where facilities can be moved with zero cost. As we have mentioned before, the online variant of the -facility reallocation problem is a generalization of the -server problem, which is one of the most natural online problems. Koutsoupias  showed a -competitive algorithm for the -server problem for every metric space, which is also -competitive, in case the metric is the real line Bartal and Koutsoupias . Other variants of the -server problem include the -server problem Bansal et al. [3, 2], the infinite server problem Coester et al.  and the -taxi problemFiat et al. , Coester and Koutsoupias .
2 Problem Definition and Preliminaries
Definition 1 (-Facility Reallocation Problem)
We are given a tuple as input. The dimensional vector
dimensional vectordescribes the initial positions of the facilities. The positions of the agents over time are described by . The position of agent at stage is and describes the positions of the agents at stage .
A solution of K-Facility Reallocation Problem is a sequence . Each is a dimensional vector that gives the positions of the facilities at stage and is the position of facility at stage . The cost of the solution is
Given an instance of the problem, the goal is to find a solution that minimizes the . The term describes the cost for moving the facilities from place to place and we refer to it as moving cost, while the term describes the connection cost of the agents and we refer to it as connection cost.
In the online setting, we study the special case of -facility reallocation problem. We evaluate the performance of our algorithm using competitive analysis; an algorithm is -competitive if for every request sequence, its online performance is at most times worse (up to a small additive constant) than the optimal offline algorithm, which knows the entire sequence in advance.
3 Polynomial Time Algorithm
Our approach is a typical LP based algorithm that consists of two basic steps.
Step 1: Expressing the -Facility Reallocation Problem as an Integer Linear Program.
Step 2: Solving fractionally the Integer Linear Program and rounding the fractional solution to an integral one.
3.1 Formulating the Integer Linear Program
A first difficulty in expressing the -Facility Reallocation Problem as an Integer Linear Program is that the positions on the real line are infinite. We remove this obstacle with help of the following lemma proved in .
Let an instance of the -facility reallocation problem. There exists an optimal solution such that for all stages and ,
According to Lemma 3.1, there exists an optimal solution that locates the facilities only at positions where either an agent has appeared or a facility was initially lying. Lemma 3.1 provides an exhaustive search algorithm for the problem and is also the basis for the Dynamic Programming approach in . We use Lemma 3.1 to formulate our Integer Linear Program.
The set of positions can be represented equivalently by a path . In this path, the -th node corresponds to the -th leftmost position of and the distance between two consecutive nodes on the path equals the distance of the respective positions on the real line. Now, the facility reallocation problem takes the following discretized form: We have a path that is constructed by the specific instance . Each facility is initially located at a node and at each stage , each agent is also located at a node of . The goal is to move the facilities from node to node such that the connection cost of the agents plus the moving cost of the facilities is minimized.
To formulate this discretized version as an Integer Linear Program, we introduce some additional notation. Let be the distance of the nodes in , be the set of facilities and be the set of agents. For each , is the node where agent is located at stage . We also define the following -indicator variables for all : if, at stage , agent connects to a facility located at node , if, at stage , facility is located at node , if facility was at node at stage and moved to node at stage . Now, the problem can be formulated as the Integer Linear Program depicted in Figure1.
The first three constraints correspond to the fact that at every stage , each agent must be connected to a node where at least one facility is located. The constraint enforces each facility to be located at exactly one node . The constraint describes the cost for moving facility from node to node . The final two constraints ensure that facility moved from node to node at stage if and only if facility was at node at stage and was at node at stage ( iff and ).
We remark that the values of are determined by the initial positions of the facilities, which are given by the instance of the problem. The notation should not be confused with , which is the position of facility at stage on the real line .
3.2 Rounding the Fractional Solution
Construct the path and the Integer Linear Program (1).
Solve the relaxation of the Integer Linear Program (1).
Rounding 111the nodes can be equivalent calculated with the simpler criterion, is the most left node with . See also Section 3.4.: For each stage :
For , find the node such that
Locate facility at the respective position of node on the line
Our algorithm, described in Algorithm 1, is a simple rounding scheme of the optimal fractional solution of the Integer Linear Program of Figure 1. This simple scheme produces an integral solution that has the exact same cost with an optimal fractional solution.
Theorem 3.1 is the main result of this section and it implies the optimality of our algorithm. We remind that by Lemma 3.1, there is an optimal solution that locates facilities only in positions . This solution corresponds to an integral solution of our Integer Linear Program, meaning that is greater than or equal to the cost of the optimal fractional solution, which by Lemma 3.1 equals . We dedicate the rest of the section to prove Theorem 3.1. The proof is conducted in two steps and each step is exhibited in sections 3.3 and 3.4 respectively.
In section 3.3, we present a very simple rounding scheme in the case, where the values of the variables of the optimal fractional solution satisfy the following assumption.
Let and be either or , for some positive integer .
Although Assumption 1 is very restrictive and its not generally satisfied, it is the key step for proving the optimality guarantee of the rounding scheme presented in Algorithm 1. Then, in section 3.4 we use the rounding scheme of section 3.3 to prove Theorem 3.1. In the upcoming sections, will denote the values of these variables in the optimal fractional solution of the ILP (1).
3.3 Rounding Semi-Integral solutions
Throughout this section, we suppose that Assumption 1 is satisfied; and are either or , for some positive integer . If the optimal fractional solution meets these requirements, then the integral solution presented in Lemma 3.2 has the same overall cost. The goal of the section is to prove Lemma 3.2.
denotes the set of nodes of with a positive amount of facility () at stage ,
We remind that since or , . We also consider the nodes in to be ordered from left to right.
Let be the integral solution that at each stage places the -th facility at the node of i.e. . Then, has the same cost as the optimal fractional solution.
The term m-th facility refers to the ordering of the facilities on the real line according to their initial positions . The proof of Lemma 3.2 is quite technically complicated, however it is based on two intuitive observations about the optimal fractional solution.
The set of nodes at each agent connects at stage are consecutive nodes of . More precisely, there exists a set such that
Let an agent that at some stage has and for some . Assume that and to simplify notation consider . Now, increase by and decrease by , where . Then, the cost of the solution is decreased by , thus contradicting the optimality of the solution. The same argument holds if . The proof follows since .
Under Assumption 1, the -th facility places amount of facility from the to the node of i.e. to nodes .
Observation 2 serves in understanding the structure of the optimal fractional solution under Assumption 1. However, it will be not used in this form in the rest of the section. We use Lemma 3.3 instead, which is roughly a different wording of Observation 2 and its proof can be found in subsection A.1 of the Appendix.
Let the fractional moving cost of facility at stage . Then
Let the integral solution that places at stage the -th facility at the node of i.e. .
Let be the moving cost of facility at stage in the optimal fractional solution and MovingCost the total moving cost of the facilities in the integral solution . Then,
By the definition of the solutions we have that:
The last equality comes from Lemma 3.3.
Lemma 3.4 states that if we pick uniformly at random one of the integral solutions , then the expected moving cost that we will pay is equal to the moving cost paid by the optimal fractional solution. Interestingly, the same holds for the expected connection cost. This is formally stated in Lemma 3.5 and it is where Observation 1 comes into play.
Let denote the connection cost of agent at stage in . Then,
As already mentioned, the proof of Lemma 3.5 crucially makes use of Observation 1 and is presented in the subsection A.1 of the Appendix. Combining Lemma 3.4 and 3.5 we get that if we pick an integral solution uniformly at random, the average total cost that we pay is , where is the optimal fractional cost. More precisely,
Since , we have that and this proves Lemma 3.2.
3.4 Rounding the General Case
In this section, we use Lemma 3.2 to prove Theorem 3.1. As already discussed, Assumption 1 is not satisfied in general by the fractional solution of the linear program (1). Each will be either or , for positive some integers . However, each positive will have the form , where . This is due to the constraint .
Consider the path constructed from path as follows: Each node is split into copies with zero distance between them. Consider also the LP (1), when the underlying path is and at each stage , each agent is located to a node of that is a copy of ’s original location, , where . Although these are two different LP’s, they are closely related since a solution for the one can be converted to a solution for the other with the exact same cost. This is due to the fact that for all , , where and .
The reason that we defined and the second LP is the following: Given an optimal fractional solution of the LP defined for , we will construct a fractional solution for the LP defined for with the exact same cost, which additionally satisfies Assumption 1. Then, using Lemma 3.2 we can obtain an integral solution for with the same cost. This integral solution for can be easily converted to an integral solution for . We finally show that these steps are done all at once by the rounding scheme of Algorithm 1 and this concludes the proof of Theorem 3.1.
Given the fractional positions of the optimal solution of the LP formulated for , we construct the fractional positions of the facilities in as follows: If , then facility puts a amount of facility in nodes of the set that have a amount of facility. The latter is possible since there are exactly copies of each and (that is the reason we required copies of each node). The values of the rest of the variables are defined in the proof of Lemma 3.7 that is presented in the end of the section. The key point is that the produced solution will satisfy the following properties (see Lemma 3.7):
its cost equals
or , for each
or for each
, for each
Clearly, this solution satisfies Assumption 1 and thus Lemma 3.2 can be applied. This implies that the integral solution for that places the -th facility to the node of () has cost . So the integral solution for that places the -th facility to the node , such that , has again cost .
A naive way to determine the nodes is to calculate , construct and its fractional solution, find the nodes and determine the nodes of . Obviously, this rounding scheme requires exponential time. Fortunately, Lemma 3.6 provides a linear time rounding scheme to determine the node given the optimal fractional solution of . This concludes the proof of Theorem 3.1.
The node of is a copy of the node if and only if
Let node of be a copy of the node . Then
The above equations hold because of the property and that is either or .
Now, let and assume that -th node of is a copy of . If , then and if , then . As a result, .
We remark that the nodes can be determined with an even simpler way than that presented in Algorithm 1. That is is the most left node such that . However, this rounding strategy requires some additional analysis.
Let the optimal fractional solution for the LP with underlying path . Then, there exists a solution of the LP with underlying path such that
Its cost is .
or , for each
or , for each
, for each
First, we set values to the variables . Initially, all . We know that if , then it equals , for some positive integer . For each such , we find with . Then, we set for . Since there are copies of each node and , we can always find sufficient copies of with . When this step is terminated, we are sure that conditions are satisfied.
We continue with the variables . Initially, all . Then, each positive has the form . Let to simplify notation. We now find copies of of and of so that
We then set . Again, since and we can always find pairs of copies of and that satisfy the above requirements. We can now prove that the movement cost of each facility is the same in both solutions.
The second equality follows from the fact that are copies of respectively and thus .
Finally, set values to the variables for each . Again, each positive equals , for some positive integer. We take copies of , and set . The connection cost of each agent remains the same since
The third equality holds since .
4 A Constant-Competitive Algorithm for the Online 2-Facility Reallocation Problem
In this section, we present an algorithm for the online 2-facility reallocation problem and we discuss the core ideas that prove its performance guarantee. The online algorithm, denoted as Algorithm 2, consists of two major steps.
In Step , facilities are initially moved towards the positions of the agents. We remark that in Step , the final positions of the facilities at stage are not yet determined. The purpose of this step is to bring at least one facility close to the agents. This initial moving consists of three cases (see Figure 2), depending only on the relative positions of the facilities at stage and the agents at stage .
In Step , our algorithm determines the final positions of the facilities . Notice that after Step , at least one of the facilities is inside the interval , meaning that at least one of the facilities is close to the agents. As a result, our algorithm may need to decide between moving the second facility close to the agents or just letting the agents connect to the facility that is already close to them. Obviously, the first choice may lead to small connection cost, but large moving cost, while the second has the exact opposite effect. Roughly speaking, Algorithm 2 does the following: If the connection cost of the agents, when placing just one facility optimally, is not much greater than the cost for moving the second facility inside , then Algorithm 2 puts the first facility to the position that minimizes the connection cost, if one facility is used. Otherwise, it puts the facilities to the positions that minimize the connection cost, if two facilities are used. The above cases are depicted in Figure 3. We formalize how this choice is performed, introducing some additional notation.
denotes the positions of the agents at stage ordered from left to right.
If is a set of positions with , then denotes the median interval of the set, which is the interval . If , then is a single point.
denotes the optimal connection cost for the set when all agents of connect to just one facility. That is We also define .
(resp. ) denotes the positions of the agents that connect to facility (resp. ) at stage in the optimal solution . (resp. ) denotes the positions of the agents that connect to facility (resp. ) at stage in the solution produced by Algorithm 2.