# Learning Augmented Online Facility Location

Following the research agenda initiated by Munoz Vassilvitskii [1] and Lykouris Vassilvitskii [2] on learning-augmented online algorithms for classical online optimization problems, in this work, we consider the Online Facility Location problem under this framework. In Online Facility Location (OFL), demands arrive one-by-one in a metric space and must be (irrevocably) assigned to an open facility upon arrival, without any knowledge about future demands. We present an online algorithm for OFL that exploits potentially imperfect predictions on the locations of the optimal facilities. We prove that the competitive ratio decreases smoothly from sublogarithmic in the number of demands to constant, as the error, i.e., the total distance of the predicted locations to the optimal facility locations, decreases towards zero. We complement our analysis with a matching lower bound establishing that the dependence of the algorithm's competitive ratio on the error is optimal, up to constant factors. Finally, we evaluate our algorithm on real world data and compare our learning augmented approach with the current best online algorithm for the problem.

## Authors

• 24 publications
• 5 publications
• 14 publications
• 1 publication
• ### Near-Optimal Bounds for Online Caching with Machine Learned Advice

In the model of online caching with machine learned advice, introduced b...
10/27/2019 ∙ by Dhruv Rohatgi, et al. ∙ 0

• ### Online Facility Location with Predictions

We provide nearly optimal algorithms for online facility location (OFL) ...
10/17/2021 ∙ by Shaofeng H. -C. Jiang, et al. ∙ 0

• ### Facility Reallocation on the Line

We consider a multi-stage facility reallocation problems on the real lin...
03/23/2021 ∙ by Bart de Keijzer, et al. ∙ 0

• ### Learning-Augmented Dynamic Power Management with Multiple States via New Ski Rental Bounds

We study the online problem of minimizing power consumption in systems w...
10/25/2021 ∙ by Antonios Antoniadis, et al. ∙ 0

• ### Double Coverage with Machine-Learned Advice

We study the fundamental online k-server problem in a learning-augmented...
03/02/2021 ∙ by Alexander Lindermayr, et al. ∙ 0

• ### Competitive caching with machine learned advice

Traditional online algorithms encapsulate decision making under uncertai...
02/15/2018 ∙ by Thodoris Lykouris, et al. ∙ 0

• ### Optimal Online Algorithms for One-Way Trading and Online Knapsack Problems: A Unified Competitive Analysis

We study two canonical online optimization problems under capacity/budge...
04/22/2020 ∙ by Ying Cao, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Online algorithms is a field that deals with algorithmic problems in which the input is not entirely known in advance, but rather arrives in a sequential way. The algorithm is required to make irrevocable decisions, only based on the input received at a given point, and to incur the corresponding irrevocable cost for each of them. Traditionally, in the analysis of online algorithms we assume, rather pessimistically, that an adversary always presents the algorithm with the worst case input. More precisely, the performance of online algorithms is evaluated by the competitive ratio [7], which is the worst-case ratio of total algorithm’s cost to the cost of a computationally unlimited optimal algorithm that is aware of the entire request sequence in advance.

On the other hand, one of the main goals in the field of machine learning is to predict the unknown based on data, and

learn what the world looks like, rather than preparing for the worst that can happen. In an effort to exploit this, there has been a trend in recent years that tries to use machine learning predictions in order to deal with the inherent uncertainty in online algorithms, while still providing worst case performance guarantees. Specifically, one might think that directly using machine learning in online problems should enhance their performance, since by knowing, with some error, how the input will look like, we should be able to come up with almost optimal solutions/approximations. This turns out not to be true in reality, since the error of the learner does not remain as is necessarily, and it could propagate during different phases of the algorithm and cause a much bigger error.

In a recent work, Lykouris and Vassilvitski [19] tried to formulate a framework to provide formal guarantees for these learning augmented online algorithms, in terms of consistency and robustness. Specifically, they require the algorithm to be near optimal, if the predictions are good enough (consistency), and for worse predictions, the competitive ratio should gracefully degrade to the worst case one (robustness). Generally, the idea of combining online algorithms with ML advice is that in the end, we are able to overcome the traditional worst-case analysis lower bounds, and we are guaranteed to get the best of both worlds. The ML-enhanced algorithm, given some predictions with total error , is required to make choices online. In the end, following the idea of [3], the guarantees are given as a function of . Many online problems have already been studied under this framework, like ski rental, scheduling, the secretary problem, metrical task systems (MTS) and more, for which we have included a brief overview in the related work section. An important online problem not yet studied until now, is online facility location.

#### Online Facility Location.

In the online facility location problem, introduced by Meyerson [23], we are presented with a sequence of demands location in an underlying metric space. Each demand must be connected to an open facility upon arrival. Each facility has an opening cost, and cannot be closed once opened, and every demand pays its connection cost, which is the distance to the closest open facility. Our goal is to decide where to open the facilities and where to connect each arriving demand, while paying the minimum possible opening and connection cost. Meyerson presented a randomized algorithm that achieves a competitive ratio of , where is the number of demands. After Meyerson’s initial result, there has been continuing work on this problem, as well as many of its variants. We provide a brief list in the related work section.

### 1.1 Our Contribution

In this work, we study the online facility location problem in the framework of learning augmented online algorithms, following a similar approach to [3] in terms of evaluating the performance of our algorithms. We focus on the version with uniform facility costs, where the cost of opening a facility at any point of the underlying metric space is . We present a simple randomized algorithm which for every demand, receives a prediction on the location of the optimal facility for this demand. The error is the sum, over all predictions, of their distance to the respective optimal facility, while we define the error (

) to be the maximum such distance. Our algorithm decides on whether to open a facility at the predicted location with probability proportional to the distance of the predicted location to the nearest open facility. This causes the algorithm to not undesirably open too many facilities in a certain area, since once a facility is opened, the probability of another one opening is decreased. In addition to proving the guarantee of our algorithm, we also show how to adapt the results of Antoniadis et al.

[2] on MTS, in order to get the minimum competitive ratio of our proposed algorithm and the worst case Meyerson’s algorithm. Our main theorem is stated below.

###### Theorem (Main Theorem).

For , there exists an algorithm that has competitive ratio . For , the competitive ratio is

 O(min(lognloglogn,log(err∞)log(err−11log(err∞))))

where is the number of demand points, , and is the facility opening cost.

We note that in contrast to the initial work in this area, our algorithm does not require any hyper-parameter to be given to it up front. We also note that the dependence on implies that, even if the error is high, our algorithm can still be close to optimal, if the error is evenly distributed among the predictions, which also is the case in practical settings; our predictions will be a bit off, but no prediction will be off by a lot.

Our second result is a matching lower bound, on any randomized algorithm that uses these types of predictions. This implies that our proposed algorithm is optimal up to constants.

###### Theorem (Lower Bound).

No randomized algorithm can preform better than

 Ω(log(err∞)log(err−11log(err∞)))

where is the number of demand points, , and is the facility opening cost.

Finally, we experimentally evaluate our algorithms on both real-world and synthetic datasets for different types of predictions and errors.

### 1.2 Related work

#### Learning Augmented Algorithms.

This line of work was initiated by Munoz and Vassilvitski [21] and Lykouris and Vassilvitski [19], who formally introduced the notions of consistency and robustness. Purohit et al. [26] considered the ski rental problem and the non-clairvoyant scheduling problem, giving consistency and robustness guarantees that depend on a hyperparemeter that has to be given to the algorithm in advance. Following this, Gollapudi and Panigrahi [15] also considered the ski rental problem in the setting of multiple predictors, while Wang et al. [29] considered the multi-shop ski-rental problem, a generalization of the classic ski rental problem. Lykouris and Vassilvitskii [19] studied the classical caching problem (also known in the literature as paging), and were able to adapt the classical Marker algorithm [11] to obtain a trade-off between robustness and consistency. Rohatgi [27] and Wei [30] subsequently presented simpler learning-augmented caching algorithms with improved dependence of their competitive ratios on the prediction errors.

Further results in online algorithms with machine learned advice include the work by Lattanzi et al. [18], who studied the restricted assignment scheduling problem, the work of Bamas et al. [4], who considered energy minimization problems, and the more general framework of online primal dual algorithms [5]. The work by Mitzenmacher [24] considered a slightly different scheduling/queuing problem, introducing also a different quality measure for evaluating algorithms, called “the price of misprediction”.

Adopting a slightly different model, Mahdian et al. [20]

studied problems where it is assumed that there exists an optimistic algorithm (which could in some way be interpreted as a prediction), and designed a meta-algorithm that interpolates between a worst-case algorithm and the optimistic one. They considered several problems, including the allocation of online advertisement space, and for each gave an algorithm whose competitive ratio is also an interpolation between the competitive ratios of its corresponding optimistic and worst-case algorithms. However, the performance guarantee is not given as a function of the “prediction” error, but rather only as a function of the respective ratios and the interpolation parameter. In

[3], the following online selection problems are studied: (i) the classical secretary problem, (ii) online bipartite matching with vertex arrivals and (iii) the graphic matroid secretary problem.

#### Online Facility Location.

(Metric uncapacitated) facility location is a classical optimization problem that has been widely studied in both the operations research and the computer science literature (see e.g., [25, 10], and more recently, in computer science, especially from the viewpoint of approximation algorithms, see e.g., [28]). Its online version has received significant attention since its introduction by Meyerson [23]. Meyerson initially presented a randomized algorithm and proved that its competitive ratio is . In subsequent work, Fotakis [13] gave a lower bound of and showed that Meyerson’s algorithm is asymptotically tight, matching the lower bound. Other algorithms were also given using different techniques; Fotakis [12] gave a deterministic primal-dual -competitive algorithm and Anagnostopoulos et al. [1] gave a deterministic -competitive algorithm using an idea for hierarchical partitioning of the metric space. For some follow up work on online facility location and its variants we refer the reader to the survey by Fotakis [14]. More recently, there has been continuing work on the dynamic variant of online facility location [9, 8, 16].

## 2 Preliminaries

We formally introduce the online facility location problem, alongside some notation we use in the rest of the technical sections. We also show, as a modification of Theorem 18 in [2]111In their work they show that for problems in Metrical Task Systems, it is possible to combine many online algorithms and in the end obtain the best guarantee of all, up to a constant factor., that in our problem it is also possible to combine two (or more) online algorithms and get the best competitive ratio of the two.

### 2.1 The Online Facility Location problem

In the online facility location problem, we are presented with a sequence of demands in a metric space . A new demand point arrives at each time step, and needs to be connected to an open facility. The algorithm decides whether to open a new facility and connect this point or connect it to an already open one. Each facility has a cost to open, and the point pays in service cost. The goal is to minimize the assignment and facility opening cost;formally let be the set of demand points, and be the set of finally open facility. The goal is to minimize

 f⋅|F|+∑x∈Dminy∈Fd(x,y)

We denote by OPT the optimal offline cost of the facility location instance, the optimal assignment cost, and the cost of Meyerson’s and our algorithm respectively.

### 2.2 Predictions

For every new point that arrives, we get a prediction on the facility it should connect to. We define the error of the predictions sequence as the norm of the distances from the optimal facility, denoted by for this demand. Formally every point has error . We denote by the total -error, and by the -error.

The currently open facility is denoted by , and the error of is .

### 2.3 Combining two online algorithms for OFL

Inspired by a method for combining online algorithms for metrical task systems that appeared in [2], we provide here an adaptation of that method that makes it applicable to our setting. Suppose that we have algorithms and for the OFL problem. Our goal is to construct a new online algorithm , that has cost . We will reduce this setting to the well known cow path problem, and prove a more specific bound of for the cost of our algorithm.

###### Theorem 2.1.

Given online algorithms for the OFL problem, the algorithm has cost at most for any input sequence .

###### Proof Sketch.

The idea behind the proof is inspired by the classical online algorithm for the cow path problem. In line of Algorithm 1, we copy exactly the decisions of one of the two algorithms available. However, when we switch from one of the algorithms to the other one, we need to incur a certain cost in order to reach the state (i.e., the set of open facilities) the new algorithm would be in. In our case, this is the cost of the facilities that the new algorithm would have opened since the last time our combined algorithm was following it. The idea is that we only follow the decisions of some algorithm, while it has not paid cost more than a constant factor larger than the alternative one. In this way, we can guarantee that at any point in the demand sequence, our combined algorithm incurs a cost which is within a constant factor of the minimum cost of the two algorithms. The main technical part of the proof is to show that the switching costs are not significant (i.e they are also within a constant factor from the minimum cost). This is shown by bounding from above the total cost of the combined algorithm by a geometric sum.

The full proof of the theorem is deferred to the appendix. ∎

## 3 PredFL: Learning Augmented Facility Location

In this section we describe the algorithm that uses predictions to solve an online facility location problem, and show the guarantees this algorithm achieves. Intuitively, the algorithm works in a similar way to Meyerson’s OFL [23]; every time a demand-prediction pair arrives, we open a facility at the prediction with probability proportional to the distance of the prediction from the already open facility. This is formally described in Algorithm 2. Initially we show that when the error is , we get a solution constant away from the optimal. Then we proceed to show our main theorem (Theorem 3.2).

###### Theorem 3.1 (Consistency).

When , algorithm 2 has competitive ratio .

It is clear that for one optimal cluster the algorithm achieves the exact optimal. The factor comes from the case of multiple optimal clusters; when a facility has already opened on an optimal center, each one of the next optimal centers needs to “accumulate potential” before opening, and then opens. We defer the proof of this theorem to subsection A.2 of the Appendix.

In the general case, where the error is positive, we show the following general result.

###### Theorem 3.2.

Algorithm 2, when has competitive ratio

 O(min(lognloglogn,log(err∞)log(err−11log(err∞))))

where is the total number of demands, and .

For the analysis of this theorem, on a high level, we divide the space around each optimal center in concentric rings, each corresponding to one phase of the algorithm. In each ring there are the close demands, that contribute to the assignment cost, and the far ones, that have higher probability towards opening a facility. The analysis works by charging the close ones towards the -error, and the far ones towards the opening of a facility closer to the optimal center. The structure is depicted in Figure 1. In order to show the main theorem, we need the following two lemmas, bounding the cost of close and far demands.

###### Lemma 3.3 (Far demands).

In the construction of Theorem 3.2, denote by the total cost incurred by far demands and the total number of phases, then

###### Lemma 3.4 (Close demands).

In the construction of Theorem 3.2, denote by the total cost incurred by close demands, then

The proofs of both lemmas are defered to section A.2 of the Appendix. We now formally prove the main theorem.

###### Proof of Theorem 3.2.

We focus the analysis on one optimal center , in Lemma A.2 in the Appendix we show why this assumption is without loss of generality. We divide the area around the optimal center into rings centered at called zones (see also Figure 1), where each zone contains all the points whose prediction is at distance . Each of the zones, corresponds in one Phase of the algorithm; Phase ends, when a facility is opened in zone .

Denote by the total number of zones and observe that for the demands with predictions at distance , even if we charge them all with error , we at most double the final cost. Therefore we want the innermost zone to be

 ηopmℓ≈fn. (1)

Consider at some phase of the algorithm that the closest (to the center) open facility is at zone . In each phase we split the demands that arrive in two types; the close ones whose prediction is within of Zone and the far ones whose prediction is at least far from Zone , and therefore from the currently open facility. The parameters and will be chosen at the end of the analysis. The structure is depicted on figure 1.

Observe that when a facility is opened at zone , all demands that have their prediction on the outside of this zone, cannot incur cost more than , which comes from Lemma A.1 and the fact that .

Formally, we bound the cost of each of the two types of demands that arrive. Assume that the algorithm is at Phase . Denote by and the current open facility closest to and its error respectively. From the definition of a Phase, we know that

 ηopmi≤η(i)op≤ηopmi−1 (2)

Denote by and the cost incurred by the demands that are far and close respectively and by the total number of phases. Using Lemmas 3.3, 3.4 to bound the expected cost of the far and close demands we get:

 E[ALG]=E[Cclose]+E[Cfar]≤Asg∗+(2mλ+1)η1+ℓ3λ(λ−1)f

Setting to be any constant we get that the cost of the algorithm is . We want to set such that . Substituting in equation 1, rearranging and observing that in the worst case we get

 mlogm=fη1log(η∞nf).

This gives where , and is the Lambert function. Using the bound on for , given in [17], we get that , which gives the second part of the competitive ratio in the theorem statement. Combining this with Theorem 2.1, and running Meyerson’s algorithm in parallel, we get the theorem statement. ∎

## 4 Lower Bound

After designing an algorithm that uses predictions, one might wonder whether this algorithm achieves the optimal competitive ratio. Next, we show that this is not possible up to constants, by giving a lower bound on any randomized algorithm for online facility location that uses this type of predictions.

Specifically, we show that given a value on the total error , we can construct a distribution of inputs and predictions, such that the cost any deterministic algorithm has to pay, is at least the same as the algorithm’s guarantee described in Theorem 3.2. This is a result of using Yao’s principle for obtaining lower bounds (chapter 8.4 of [7] and [31]). We emphasize that due to its construction, this lower bound is independent of any algorithm.

###### Theorem 4.1.

For any error value , there exists an instance of demands and predictions, with total error equal to , such that no algorithm can achieve a competitive ratio better than

 Ω(log(err∞)log(err−11log(err∞)))

where and .

###### Proof.

The construction of the lower bound is similar to the online facility location lower bound described in [13]. Specifically, the metric space is a binary Hierarchical Well-Separated Tree with levels, where . The distance of the root to its children is , and the edge lengths along a path from the root to a leaf decrease by a factor of at every level. The distance of any vertex at level to its children is . The construction is shown in figure 5 of the appendix. We denote by the subtree of rooted at a vertex . Observe that the following two properties hold

1. The distance of a level- vertex to any vertex in is at most

2. The distance to any vertex not in is a least

#### Constructing the demand sequence:

The demand sequence is divided into phases. Phase consists of demands at the root of . After the end of phase , if is not a leaf, the adversary selects uniformly at random from the two children of . In the next phase, , demands arrive at the vertex . Since is the total number of demands, we require that

 n=λ∑i=0mia=1αmλ+1−1m−1≈mλ/α (3)

#### Constructing the predictions sequence:

Similarly, the predictions sequence is divided into phases. For each demand of phase , we get prediction at distance from the optimal center. Observe that since the length of the edges increases proportionally to , each prediction is located along the edge connecting the vertices and , as shown in Figure 5.

#### Cost of the optimal:

The optimal solution opens a single facility at . Using property observe that in each phase the optimal incurs assignment cost at most , since . Therefore, the overall optimal cost is at most , using that . Setting we get

 \textscOPT≤2fmm−1. (4)

#### Cost of any algorithm:

Denote by ALG any deterministic algorithm. We abuse notation and use ALG to also denote the algorithm’s cost. At the end of any phase , ALG knows that there exists a facility in , but it cannot tell in which of ’s subtrees the facility should be. We fix the adversary’s choices up to the end of the phase and consider the cost incurred by ALG for demands and facilities not in , i.e. , and distinguish the following cases

• ALG has no facilities in

at the moment the first demand at

arrives:
The assignment cost for the demands at is at least .

• ALG has at least one facility in :
ALG opens a single facility at , as the prediction reveals the true subtree where the optimal center is located in. In this case, ALG incurs zero cost for the demands at , whereas the assignment cost for the demands at is at least . Thus, the overall cost is at least .

However, with probability the adversary selects so that at least one of ALG’s facilities is not included in , therefore the algorithm’s cost is at least for every two phases. Taking into account only the first phases, we get . Combining this with equation (4) we get a competitive ratio of . In this construction we have that , and using that in equation (3), given also the definition of and that , and taking the we get

 mlogm−η1flogm=fη1log(αnη∞f) (5)

Observing that in our construction, the first term is within constant of , and using a similar argument to the upper bound (Theorem 3.2), we get the theorem. ∎

## 5 Experiments

In this section we describe the setup for our experiments, the datasets we used and the way we created the predictions.

#### Datasets

We use the following datasets, also used in [8]. All datasets are equipped with the Euclidean metric, and for the [6, 22] datasets we restricted our experiments to the first 20K points.

• [itemsep=0pt,parsep=0pt]

• The CoverType data set [6], from the UCI repository with 58K instances of 54 dimensions.

• The US Census data set [22], from the UCI repository with 2.5 million instances of 68 dimensions.

• A synthetic dataset, created by sampling 2K points uniformly at random on the grid .

Our code is written in Python and the experiments were executed on a Debian virtual machine on Google Cloud with 16vCPUs and of memory. The code is included in the supplementary material.

#### Predictions

To generate the predictions, we first calculate the optimal offline solution given the demands on the metric space. In order to calculate the optimal solution, we solved the LP- relaxation of facility location, using Gurobi version , and using deterministic rounding we obtained a -approximate integer solution. Specifically for the CoverType and US Census datasets, we split them into batches of size and for each batch we solved the offline optimal. We used the following three methods to generate the predictions of each demand.

1. alpha_predictor: the prediction is the point which is located on the line connecting the demand and the corresponding optimal center at distance from .

2. alpha_gaussian_predictor: operates in a similar way. The prediction is at distance , where and are given as parameters.

3. perturb_gaussian_predictor: uses the same parameters as the second method, but the prediction is generated by , where

are the vectors of the optimal center, the demand and the prediction respectively.

#### Graphs

In the following graphs we compare the competitive ratio of Meyerson’s online facility location algorithm (Meyerson) and our proposed algorithm with predictions (PredFL). The costs of each algorithm have been calculated using the worst case scenario. In all three figures we observed the same behavior in terms of the competitive ratio. Our algorithm’s competitive ratio converges smoothly to the Meyerson’s as the parameter increases from to .

An interesting finding of our experiments (the corresponding plots are deferred to the appendix) is that in alpha_gaussian_predictor

, the total cost of the online algorithm is slightly decreasing with the standard deviation. On the one hand, increasing the standard deviation might cause

to increase. On the other hand, some of the subsequent far predictions are bound to be closer to the optimal center, and due to our facility opening rule these predictions are more likely to turn into new facilities, due to their increased distance to the nearest open facility. Interestingly, the latter effect dominates, which results in a slightly decreased competitive ratio, as the standard deviation increases. This is in accordance with our lower bound construction, where , for all demands .

## 6 Conclusions

In this work, we are the first to study the online facility location problem under the framework of learning-augmented algorithms. Adapting Meyerson’s algorithm, we present a simple randomized algorithm that incorporates predictions on the position of the optimal facility and achieves a sublogarithmic, in the prediction error, competitive ratio. We also showed that the competitive ratio of our algorithm is optimal, up to constant factors, by proving a matching lower bound on all randomized algorithms. We also experimentally evaluated our results on both synthetic and real life datasets, and confirmed the theoretical results.

We showed that existing online algorithms techniques can be adapted to work in the learning augmented framework, and can overcome the worst-case lower bounds, while providing a smooth way to transition from near optimal competitive ratio (when predictions are good enough) to the worst case (when predictions are bad). This line of work can be extended to other online network design problems, such as the online Steiner tree problem.

## References

• [1] A. Anagnostopoulos, R. Bent, E. Upfal, and P. V. Hentenryck (2004) A simple and deterministic competitive algorithm for online facility location. Inf. Comput. 194 (2), pp. 175–202. External Links: Cited by: §1.2.
• [2] A. Antoniadis, C. Coester, M. Eliás, A. Polak, and B. Simon (2020) Online metric algorithms with untrusted predictions. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, Proceedings of Machine Learning Research, Vol. 119, pp. 345–355. External Links: Link Cited by: 1st item, §1.1, §2.3, §2.
• [3] A. Antoniadis, T. Gouleakis, P. Kleer, and P. Kolev (2020) Secretary and online matching problems with machine learned advice. CoRR abs/2006.01026. External Links: Link, 2006.01026 Cited by: §1.1, §1.2, §1.
• [4] É. Bamas, A. Maggiori, L. Rohwedder, and O. Svensson (2020) Learning augmented energy minimization via speed scaling. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin (Eds.), External Links: Link Cited by: §1.2.
• [5] É. Bamas, A. Maggiori, and O. Svensson (2020) The primal-dual method for learning augmented algorithms. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin (Eds.), External Links: Link Cited by: §1.2.
• [6] J. A. Blackard, D. J. Dean, and C. W. Anderson. Covertype data set. Cited by: 1st item, §5.
• [7] A. Borodin and R. El-Yaniv (1998) Online computation and competitive analysis. Cambridge University Press. External Links: ISBN 978-0-521-56392-5 Cited by: §1, §4.
• [8] V. Cohen-Addad, N. Hjuler, N. Parotsidis, D. Saulpic, and C. Schwiegelshohn (2019) Fully dynamic consistent facility location. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and R. Garnett (Eds.), pp. 3250–3260. External Links: Link Cited by: §1.2, §5.
• [9] M. Cygan, A. Czumaj, M. Mucha, and P. Sankowski (2018) Online facility location with deletions. In 26th Annual European Symposium on Algorithms, ESA 2018, August 20-22, 2018, Helsinki, Finland, Y. Azar, H. Bast, and G. Herman (Eds.), LIPIcs, Vol. 112, pp. 21:1–21:15. External Links: Cited by: §1.2.
• [10] Z. Drezner and H.W. H. (Editors) (2004) Facility Location: Applications and Theory. Springer. Cited by: §1.2.
• [11] A. Fiat, R. M. Karp, M. Luby, L. A. McGeoch, D. D. Sleator, and N. E. Young (1991) Competitive paging algorithms. J. Algorithms 12 (4), pp. 685–699. External Links: Cited by: §1.2.
• [12] D. Fotakis (2007) A primal-dual algorithm for online non-uniform facility location. J. Discrete Algorithms 5 (1), pp. 141–148. External Links: Cited by: §1.2.
• [13] D. Fotakis (2008) On the competitive ratio for online facility location. Algorithmica 50 (1), pp. 1–57. External Links: Cited by: §1.2, §4.
• [14] D. Fotakis (2011) Online and incremental algorithms for facility location. SIGACT News 42 (1), pp. 97–131. External Links: Cited by: §1.2.
• [15] S. Gollapudi and D. Panigrahi (2019) Online algorithms for rent-or-buy with expert advice. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, pp. 2319–2327. External Links: Link Cited by: §1.2.
• [16] X. Guo, J. Kulkarni, S. Li, and J. Xian (2020) On the facility location problem in online and dynamic models. In

Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2020, August 17-19, 2020, Virtual Conference

, J. Byrka and R. Meka (Eds.),
LIPIcs, Vol. 176, pp. 42:1–42:23. External Links: Cited by: §1.2.
• [17] A. Hoorfar and M. Hassani (2008) Inequalities on the lambert w function and hyperpower function. J. Inequal. Pure and Appl. Math 9 (2), pp. 5–9. Cited by: §3.
• [18] S. Lattanzi, T. Lavastida, B. Moseley, and S. Vassilvitskii (2020) Online scheduling via learned weights. In Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5-8, 2020, pp. 1859–1877. External Links: Cited by: §1.2.
• [19] T. Lykouris and S. Vassilvitskii (2018) Competitive caching with machine learned advice. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, pp. 3302–3311. External Links: Link Cited by: Learning Augmented Online Facility Location, §1.2, §1.
• [20] M. Mahdian, H. Nazerzadeh, and A. Saberi (2012) Online optimization with uncertain information. ACM Trans. Algorithms 8 (1), pp. 2:1–2:29. External Links: Cited by: §1.2.
• [21] A. M. Medina and S. Vassilvitskii (2017) Revenue optimization with approximate bid predictions. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp. 1858–1866. External Links: Link Cited by: Learning Augmented Online Facility Location, §1.2.
• [22] C. Meek, B. Thiesson, and D. Heckerman. (1990) US census data. Cited by: 2nd item, §5.
• [23] A. Meyerson (2001) Online facility location. In 42nd Annual Symposium on Foundations of Computer Science, FOCS 2001, 14-17 October 2001, Las Vegas, Nevada, USA, pp. 426–431. External Links: Cited by: §1, §1.2, §3.
• [24] M. Mitzenmacher (2020) Scheduling with predictions and the price of misprediction. In 11th Innovations in Theoretical Computer Science Conference, ITCS 2020, January 12-14, 2020, Seattle, Washington, USA, T. Vidick (Ed.), LIPIcs, Vol. 151, pp. 14:1–14:18. External Links: Cited by: §1.2.
• [25] R.L. F. (. P.B. Mirchandani (1990) Discrete Location Theory. Willey. Cited by: §1.2.
• [26] M. Purohit, Z. Svitkina, and R. Kumar (2018) Improving online algorithms via ML predictions. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada, pp. 9684–9693. External Links: Link Cited by: §1.2.
• [27] D. Rohatgi (2020) Near-optimal bounds for online caching with machine learned advice. In Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5-8, 2020, pp. 1834–1845. External Links: Cited by: §1.2.
• [28] D. Shmoys (2000) Approximation Algorithms for Facility Location Problems. In 3rd Workshop on Approximation Algorithms for Combinatorial Optimization, LNCS, Vol. 1913, pp. 27–33. Cited by: §1.2.
• [29] S. Wang, J. Li, and S. Wang (2020) Online algorithms for multi-shop ski rental with machine learned advice. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin (Eds.), External Links: Link Cited by: §1.2.
• [30] A. Wei (2020) Better and simpler learning-augmented online caching. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2020, August 17-19, 2020, Virtual Conference, J. Byrka and R. Meka (Eds.), LIPIcs, Vol. 176, pp. 60:1–60:17. Cited by: §1.2.
• [31] A. C. Yao (1977) Probabilistic computations: toward a unified measure of complexity (extended abstract). In 18th Annual Symposium on Foundations of Computer Science, Providence, Rhode Island, USA, 31 October - 1 November 1977, pp. 222–227. External Links: Cited by: §4.

## Appendix A Appendix

### a.1 Proofs of Section 2

###### proof of Theorem 2.1.

For any fixed value of (suppose wlog that we are following algorithm for that value of ), we distinguish cases :

• Case : The cost of the new facilities that the algorithm would have opened between the -th and -th iteration is smaller than the total cost that the algorithm has incurred after the completion of the -th iteration plus the cost the algorithm has incurred after the completion of the -th iteration (i.e the condition of the IF statement on line is not satisfied). In this case, our argument and algorithm becomes quite similar to the one in Section 2.1 of [2] expressing the cost as a geometric sum.

• Case : The condition of the IF statement on line is satisfied and therefore the algorithm that is being followed does not change. In this case, we distinguish the following two subcases:

• a. The total cost of algorithm is at most , where denotes the current cost. In this case, the switch would have been beneficial to the algorithm. However, it is now guaranteed to happen at and the extra cost incurred is not more that . We get the desired result by adding the bound of the traditional cow path problem, which is .

• b. The total cost of algorithm is more than . In this case, not switching is beneficial to the algorithm and therefore, we can still upper bound the cost by the geometric sum in case .

To finish the proof, we distinguish the following cases:

1. For every value of , we are in case . In this case, the “else” part of the IF statement in line is always followed, and the total cost of the combined algorithm will be:

 2⋅ℓmax∑i=02i+OPT

where and Thus, we get that the cost will be at most .

2. For some values of , we are in case . In this case, if the “else” part of the IF statement in line had been followed, the algorithm would incur an extra cost of , which does not have to “pay” now. Therefore, if only either case or case are ever encountered, the cost in still bounded by .

3. For some value of , we are in case . Note this can only happen once as the algorithm terminates shortly after. In this case, as discussed above, an extra cost of at most can be incurred.

Combining the above, we deduce that the total cost of algorithm is at most

### a.2 Proofs of Section 3

###### Proof of Theorem 3.1.

If this implies that for all points that arrive, therefore every prediction corresponds to an optimal cluster. Facilities that will open are on some optimal cluster since line of the algorithm is never true, and facilities only open at predictions.

#### One optimal cluster:

Initially observe that if there is only one optimal center, or if for any two optimal centers holds that , then Algorithm 2 gives exactly the optimal. This happens since the predictions are on the optimal center, and are independent (since the distance from already open optimal center is more than ).

#### Many Optimal Clusters:

In this case, the opening probability may change depending on the position of the closest already open facility to the demand that arrived. We included a small example with 3 optimal centers in figure 3.

Denote by Center is open at time , and let be an arbitrary optimal cluster. We bound the cost of the cluster until the optimal center opens. Every new demand belonging to that arrives at time , opens the optimal cluster with probability where 222The clusters need not be distinct. Observe that every time the demand fails to open a facility at , by triangle inequality the extra cost incurred is . Let be the time that a facility opens at 333

Note that this can be seen as a generalization of the geometric distribution with varying probabilities. We have a sequence of random Bernoulli variables that each has probability of success

. , then the cost incurred until is

 ∞∑t=1Xitrc(t)=T∑t=1rc(t) (6)

Observe that by the definition of

and the random variables

, therefore , but

 ET,Xit[T∑t=1Xi]=ET[T∑t=1EXit[Xit]]=ET[T∑t=1rc(t)f].

This implies that , and using equation (6) we get that the extra cost paid is . Observing that the optimal solution also has cost at least we get the theorem. ∎

###### Proof of Lemma 3.4, Close demands.

Recall that for phase from the definition of Phase we have that

 ηopλmi≤ηx≤ηopmi. (7)

Using Lemma A.1 we have that for any close demand arriving in Phase the cost is

 E[cost x] ≤d∗(x)+rx+rxfηx+(1−rxf)η(i)op ≤d∗(x)+(mλ+1)ηx+rxfηx+(1−rxf)η(i)op Since rx≤η(i)op+ηx(mλ+1)ηx ≤d∗(x)+(mλ+1)ηx+rxfηx+(1−rxf)mληx Since η(i)op≤λmηx from (% ???, ???) ≤d∗(x)+(mλ+1)ηx++mληx+rxfηx(1−mλ) ≤d∗(x)+(2mλ+1)ηx Since 1−mλ<0

Summing up the contribution of all close demands we get the lemma. ∎

###### Proof of Lemma 3.3, Far demands.

In the case of far demands the following inequality holds

 ηx≤ηopλmi≤η(i)opλ. (8)

Using Lemma A.1 we get that

 E[cost x] ≤d∗(x)+rx+rxfηx+(1−rxf)η(i)op ≤d∗(x)+2η(i)op+rxfηx+η(i)op−rxfη(i)op Since rx≤ηx+ηop2η(i)op ≤d∗(x)+3η(i)op Since ηx−η(i)op<0 (9)

Let be the time a new facility opens and the phase ends. Summing up the cost until time and we get

 E[Cfar] =E[T∑i=1cost x] Using Inequality (9) ≤Asg∗far+3η(i)opE[T] ≤Asg∗far+3η(i)opλ(λ−1)η(i)op From the definition of far demands ≤Asg∗far+3λ(λ−1)f

Where we used that a facility opens with probability at least , therefore the expected number of steps until a facility is open is .

###### Lemma A.1.

For a demand , prediction and we have

 E[costx]≤d∗(x)+rx+rxfηx+(1−rxf)ηopen.
###### Proof of Lemma a.1.

We denote by the error for the facility and for respectively, as shown in figure 4.

We calculate the expected cost for one point

 E[costx] =rxf(f+d(x,^fx))+(1−rxf)⋅d(x,fopen) ≤rxf(f+d∗(x)+ηx)+(1−rxf)(d∗(x)+ηopen) =d∗(x)+rx+rxfηx+(1−rxf)ηopen,

where we used the triangle inequality for and . ∎

###### Lemma A.2.

The analysis of Algorithm 2 is an upper bound on the cost, if there are more than one optimal centers.

###### Proof.

Assume in the optimal solution there are optimal centers and let and be two of these optimal centers. If the centers’ zones do not overlap at all, the centers are independent. If there is some overlap, the existence of another center this close, can only reduce the cost of , therefore the analysis provided above is an upper bound on the cost. To see this observe that the proof of Theorem 3.2 only considers demands that belong to the optimal center in order to bound its cost. The fact that is close, can only cause more facilities to open in some of the zones of , that are paid for by another cluster. Therefore, the cost of a phase for center is less, since some facility not paid by the demands of is opened on one of its zones, causing the analysis to advance a phase, without having to pay the whole cost. ∎