Online Optimization with Predictions and Non-convex Losses

11/10/2019 ∙ by Yiheng Lin, et al. ∙ 0

We study online optimization in a setting where an online learner seeks to optimize a per-round hitting cost, which may be non-convex, while incurring a movement cost when changing actions between rounds. We ask: under what general conditions is it possible for an online learner to leverage predictions of future cost functions in order to achieve near-optimal costs? Prior work has provided near-optimal online algorithms for specific combinations of assumptions about hitting and switching costs, but no general results are known. In this work, we give two general sufficient conditions that specify a relationship between the hitting and movement costs which guarantees that a new algorithm, Synchronized Fixed Horizon Control (SFHC), provides a 1+O(1/w) competitive ratio, where w is the number of predictions available to the learner. Our conditions do not require the cost functions to be convex, and we also derive competitive ratio results for non-convex hitting and movement costs. Our results provide the first constant, dimension-free competitive ratio for online non-convex optimization with movement costs. Further, we give an example of a natural instance, Convex Body Chasing (CBC), where the sufficient conditions are not satisfied and we can prove that no online algorithm can have a competitive ratio that converges to 1.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Online optimization is a classical area in online learning with a long and impactful history. In this paper, we study a variation of online optimization where the learner incurs a movement (switching) cost associated with the change in actions between consecutive rounds. Specifically, we study online optimization in a setting where an online learner interacts with the environment in a sequence of rounds . In each round, a cost function is revealed and the learner chooses a point in response. After picking its point, the learner pays a hitting cost as well as a movement (switching) cost , which penalizes the learner for changing its actions between rounds. The movement cost adds a considerable degree of complexity to the decision making of the learner since it couples the learner’s actions between rounds. A choice which is near the minimizer of and incurs little cost in round might turn out to be far away from the minimizer of . Thus, the learner must balance decisions about with the potential of movement costs in the future. However, the learner does not have information about future hitting costs, which makes it difficult to choose the correct balance.

The most prominent version of online optimization with movement costs is known as Smoothed Online Convex Optimization (SOCO), and assumes that the hitting costs are convex and the movement costs are a norm. SOCO has attracted considerable attention in the past decade, e.g., (li2018online; chen2018smoothed; goel2018smoothed; shi2019value; goel2019beyond; lin2012online; lin2013dynamic; chen2015online; chen2016using; badiei2015online), driven in part by its connection to classical online algorithms problems such as Convex Body Chasing (CBC) (bubeck2018chasing; argue2019nearly; sellke2019chasing), Metrical Task Systems (MTS) (borodin1992optimal; bartal1997polylog; blum2000line), and the -server problem (manasse1990competitive; bubeck2018k; buchbinder2019k). Additionally, much of the work on SOCO has been driven by its many applications, e.g., speech animation (kim2015decision), control (goel2017thinking; goel2018smoothed), smart grid (kim2014real), video streaming (joseph2012jointly), and data centers (lin2012online; comden2019online).

While initial results on SOCO provided algorithms with performance guarantees in only limited settings, e.g., (lin2012online) provides a 2-competitive algorithm for 1-dimensional SOCO problems, at this point algorithms that have constant dimension-free competitive ratios in high-dimensional settings have been discovered, e.g., (goel2018smoothed; goel2019beyond) provide a constant-competitive algorithm for strongly convex hitting costs and squared movement costs. However, while it is possible to provide dimension-free, constant competitive algorithms in settings where the learner has no information about future hitting cost functions, in most applications where SOCO is used it is possible to make accurate predictions of future costs. Such predictions are extremely valuable for the online learner and, as a result, a growing literature has considered situations where the learner has access to predictions of future costs, e.g., (lin2012online; chen2015online; chen2016using; badiei2015online; li2018online; comden2019online; shi2019value). Most typically, this stream of work considers that the learner has access to perfect predictions of the next costs, but in some cases it is possible to extend such results to noisy predictions as well, e.g., (chen2015online; chen2016using).

Clearly, the use of predictions is beneficial for the learner. With access to perfect predictions, it is possible for the learner to obtain a competitive ratio that converges to 1 as . The first such result to appear was (lin2012online), which provides an algorithm that has a competitive ratio of when hitting costs are the operating cost for servers and movement costs are incurred by toggling into and out of a power-saving mode between timeslots. Since then, other results have followed, e.g., (chen2015online; chen2016using; badiei2015online; li2018online; shi2019value), and when more stringent requirements on the cost functions are considered it is possible to achieve a result that converges to 1 exponentially quickly in (li2018online).

The discussion above highlights that there has been considerable progress in the design of competitive algorithms for SOCO, both with and without access to predictions. However, at this point all existing results require specific assumptions on both the hitting costs and movement costs, e.g., when the hitting costs are polyhedral and convex, while the movement cost is given by (see (chen2018smoothed)); when the hitting costs are strongly convex, while the movement cost is given by (see (goel2018smoothed)). In this paper, instead of studying a specific class of costs, we ask: under what general conditions is it possible for an online learner to achieve near-optimal costs both with and without predictions? In particular, is it possible to obtain constant-competitive algorithms without assumptions like strong-convexity and local polyhedrality; potentially even in the case of non-convex costs?

The case of non-convex costs is particularly tantalizing given the importance of non-convex losses for machine learning and the prominence of non-convex costs in applications such as power systems and networking. Techniques from non-convex optimization have been applied to a wide variety of problems in machine learning, including matrix factorization, phase retrieval, and sparse recovery; we refer the interested reader to

(jain2017non) for a recent survey. The Optimal Power Flow (OPF) problem at the core of the operation of power systems is also non-convex (low2014convex; low2014convex2); thus requiring online non-convex optimization for real-time control. Non-convex optimization in online settings has also been studied in a variety of other contexts, such as portfolio optimization (ardia2010differential; krokhmal2002portfolio)

and support vector machines

(ertekin2010nonconvex; mason2000improved), among many others.

Contributions of this paper

In this paper we introduce two general, sufficient conditions (see Section 4) under which is possible to achieve a constant competitive ratio without predictions and to leverage predictions to achieve near-optimal cost, i.e., a competitive ratio. Importantly, these conditions do not require convexity of the hitting or movement costs.

The first sufficient condition is an order of growth condition that ensures the hitting cost functions grow at least as quickly as the switching costs as one moves away from the minimizer. The second condition requires that the switching costs satisfy an approximate version of the triangle inequality. Nearly all assumptions made in previous papers on online optimization with movement costs are special cases of these conditions, e.g., locally polyhedral costs (chen2018smoothed), strongly convex costs (goel2018smoothed; goel2019beyond), and more (lin2012online; liu2011geographical). While we do not prove that these conditions are necessary, we do show a construction in Section 3 that highlights that there are cases where the conditions do not hold and it is impossible for an online learner to leverage predictions to achieve a near-optimal cost. Importantly, the construction we show is not a corner-case, it is an important subclass of online optimization: Convex Body Chasing (see (argue2019nearly; sellke2019chasing)).

To show that these two conditions are sufficient, we propose a novel algorithm, Synchronized Fixed Horizon Control (SFHC), and show that it is constant-competitive whenever the two conditions hold, including both when the cost functions are convex and non-convex. More specifically, we introduce two variants of SFHC, Deterministic SFHC and Randomized SFHC.

In the case when costs are convex, Deterministic SFHC provides a competitive ratio of without access to predictions and a competitive ratio of in the case of predictions (Theorem 1). Thus, SFHC unifies two distinct lines of inquiry in the literature: how to design algorithms take advantage of predictions when they are available (lin2012online; chen2015online; chen2016using; li2018online; shi2019value) and how to design algorithms that work when predictions are not available (goel2018smoothed; chen2018smoothed; bansal20152; lin2013dynamic; goel2019beyond). SFHC is the first algorithm to provide a constant-competitive guarantee in both settings.

In the case when costs are non-convex, Deterministic SFHC maintains a competitive ratio of without access to predictions but provides a competitive ratio of in the case of predictions, where (Theorem 3). Thus, it does not leverage predictions to ensure near-optimal cost. However, randomization can be used to improve the result in the case of predictions. Specifically, Randomized SFHC provides a competitive ratio of for general non-convex functions that satisfy our sufficient conditions, given an oblivious adversary (Theorem 5). Further, the result extends (with slight modifications to the design of Randomized SFHC) to the case of a semi-adaptive adversary (Theorem 6). These results represent the first constant-competitive guarantees for online optimization with movement costs and non-convex losses.

The design of SFHC is inspired by the design of Averaging Fixed Horizon Control (lin2012online), which has served as the basis for many algorithms in this space, e.g., (chen2016using; shi2019value). Like AFHC, Deterministic SFHC works by averaging the choices of different subroutines. However, the subroutines are very different than AFHC. In particular, the key insight in SFHC is that each subroutine has a sequence of synchronization points where the algorithm greedily optimizes the hitting cost for that round. These synchronization points ensure that, when the sufficient conditions hold, the algorithm does not drift too far from the actions of the offline optimal. Thus, rather than optimize cost, SFHC is designed to track the offline optimal (which also ensures good cost). The key difference between Deterministic SFHC and Randomized SFHC is that Randomized SFHC chooses an action of a subroutine uniformly at random rather than averaging the choices of the subroutines. It is perhaps surprising that randomization helps in the case of non-convex costs given that (bansal20152) shows that randomization cannot help in the case of SOCO.

Related literature

There is a large literature on online optimization, both with and without switching costs. In the setting without switching costs, most work has focused on Online Convex Optimization (OCO) (hazan2016introduction). This problem is similar to SOCO, except that (i) there are no switching costs and (ii) the online learner picks the point before observing the cost . In this problem, the goal is to design algorithms with low regret, i.e., the goal is to find a strategy that tracks the cost of the best fixed action as closely as possible. In 2003, Zinkevich described Online Gradient Descent, the first algorithm to achieve sublinear regret for OCO (zinkevich2003online). This was subsequently generalized by algorithms such as Online Mirror Descent (nemirovsky1983problem; bansal2017potential) and the Multiplicative Weights Update algorithm (arora2012multiplicative) .

Beyond the case of convex costs, online non-convex optimization (without switching costs) has also received considerable attention, e.g., (yang2018optimal; ardia2010differential; krokhmal2002portfolio; ertekin2010nonconvex; mason2000improved). Most commonly the algorithms used in these papers are variations of Online Exponential Weights. For example, recently, (yang2018optimal) presents an algorithm, Online Recursive Weighting, that achieves a bound on regret that matches the lower bound in the convex setting.

All papers above consider problems without movement costs. The inclusion of movement costs makes the problem considerably more challenging and motivates the use of a different performance measure. Specifically, instead of regret, algorithms are evaluated with respect to the competitive ratio. In fact, it is known that there is a fundamental incompatibility between regret and competitive ratio: it is impossible to create an algorithm for SOCO with both sublinear regret and constant competitive ratio (andrew2013tale).

In the case of movement costs, all previous papers focus on the convex setting. In particular, the problem of Smoothed Online Convex Optimization (SOCO), i.e. OCO with switching costs, was introduced in (lin2013dynamic) in the context of dynamic power management in data centers. Since then it has been applied across many domains, including speech animation (kim2015decision), multi-timescale control (goel2017thinking), video streaming (joseph2012jointly), thermal management of System-on-Chip (SoC) circuits (zanini2010online), and power generation planning (kim2015decision).

The original paper introducing SOCO (lin2013dynamic) gave a 3-competitive algorithm for one dimensional action spaces. Following this work, an algorithm with a competitive ratio of 2 was introduced in (bansal20152) and this was shown to be optimal in (antoniadis2017tight). Until recently, there were no algorithms for SOCO that worked beyond one dimension () that did not use predictions. However, last year it was shown that it is possible to design competitive algorithms for SOCO beyond one dimension, provided the hitting cost functions have some structure. Specifically, in (chen2018smoothed), Online Balanced Descent (OBD) was introduced and shown to have a dimension-free competitive ratio in the special case where the cost functions are polyhedral. Following this work, it was shown that OBD also provides a dimension-free competitive ratio when the hitting costs are strongly convex (goel2018smoothed) and that a variant of OBD called Regularized OBD achieves the optimal competitive ratio when hitting costs are strongly convex. We note that this literature is fairly distinct from the work involving predictions. Up until now, the algorithms that are designed to be competitive without predictions are not able to take advantage of predictions when they are available.

In many applications the online learner has some information about future costs, and making use of these predictions of future costs is crucial. This has prompted a great deal of work involving the design of algorithms that leverage predictions, e.g., (li2018using; chen2016using; badieionline; chen2015online; shi2019value; comden2019online). Most of this work considers models where the online learner has a prediction window of length , i.e. at time , the agent observes the cost functions before choosing the point (the case captures the standard SOCO setting). Naturally, as tends to infinity the algorithm has more and more information and hence should achieve better performance. In (chen2015online), it was shown that, surprisingly, Receding Horizon Control (RHC) cannot guarantee a competitive ratio that converges to one as tends to infinity; however, it was also shown that Averaging Fixed Horizon Control (AFHC) can guarantee a near-optimal competitive ratio if is sufficiently large. Later, it was shown that it is possible to obtain algorithms whose competitive ratio decays exponentially in in the setting where the hitting costs are both strongly convex with bounded gradients and uniformly bounded below by a constant and the movement cost is quadratic (li2018using). However, it is again important to note that this literature is fairly distinct from the work of designing algorithms that are competitive without predictions. Up until now, the algorithms that are designed to be competitive with predictions are not able to be competitive without the use of predictions.

While there has been considerable progress on designing algorithms for SOCO both with and without predictions, to this point the results all rely on specific structural assumptions about the costs, and no previous work extends to non-convex costs. In this paper, we do not directly make strong structural assumptions about the hitting functions or the switching costs; instead, we ask under what general conditions on the hitting costs and switching costs is it possible to design competitive algorithms? Surprisingly, we show that convexity is not a necessary condition to design a competitive algorithm; this allows us to tackle a much broader range of hitting and movement costs than prior work.

2. Problem Formulation

In this paper we study the problem of online (non-convex) optimization with switching costs. An instance of this problem consists of an initial point , a sequence of hitting cost functions , and a switching cost, a.k.a., movement cost, . The sequence of hitting costs is incrementally revealed to an online learner, who picks points in response to observing the hitting cost function and incurs costs associated with the choice.

More precisely, in the -th round, the function is revealed to the online learner, who picks a point in response, and incurs the cost . The term acts as a regularizer, discouraging the learner from changing its action between rounds. Note that, without the switching cost, it is easy for the online learner to incur the optimal cost: it simply picks in every round. Thus, it is the presence of the switching cost which makes the problem interesting and non-trivial, as it couples costs between rounds.

The online learner seeks to minimize its cumulative cost across all rounds:

where is a function that computes the total cost incurred by a sequence . We measure the performance of the online learner by comparing its cost to the offline optimal cost, i.e. the cost given full knowledge of the functions :

The goal of this paper is to design strategies for the online learner so that it incurs nearly the same cost as the offline optimal. This can be measured either in terms of regret, which compares to the offline static optimal, or the competitive ratio, which compares to the offline dynamic optimal. Our focus in this paper is on the competitive ratio, which is a more challenging measure to achieve good performance under. Formally, the competitive ratio is defined as and, if the online learner can pick a strategy which guarantees that the competitive ratio is finite for any sequence of hitting costs, we say that the online learner’s strategy is competitive.

In this paper, we seek to derive an algorithm that can be competitive in settings where hitting costs are non-convex. This paper is the first to consider costs that are non-convex in the context where movement costs are present. (Non-convex online optimization without movement costs has been considered in, e.g., (yang2018optimal; ardia2010differential; krokhmal2002portfolio; ertekin2010nonconvex; mason2000improved)). In all prior work that considers movement costs, hitting costs are assumed to be convex and so the problem is typically referred to as Smoothed Online Convex Optimization (SOCO). This problem was first introduced in the context of dynamic power management in data centers in (lin2013dynamic).

In many applications, the classical formulation of an online learner is too restrictive. It is not true that the learner has no information about future costs, instead the learner has the ability to derive (noisy) forecasts of future cost functions. As a result, there has been a great deal of work focused on designing algorithms for online learners that have access to predictions of future costs (lin2012online; chen2015online; chen2016using; li2018online; shi2019value). This line of work, initiated by (lin2012online), seeks to design algorithms which have competitive ratios that converge to as the number of predictions available to the algorithm, , grows. More specifically, in this line of work, at time an online learner with prediction window observes the cost functions before choosing the point . Note that the case of captures the standard SOCO setting. Given these predictions, the learner seeks to have a competitive ratio of the form , where as . Thus, as the number of predictions grows the cost of the learner converges to the offline optimal cost. Under well-behaved costs it is sometimes possible for to decay exponentially, e.g. (li2018online); however for general cost functions polynomial decay is the goal, e.g., (chen2016using).

Notice that, in the formulation just described, the predictions of future cost functions given to the learner are perfect. While in real applications predictions are noisy, due to technical challenges this assumption is common, e.g., see (lin2012online; li2018online). There does exist some work that has extended the results to noisy predictions in limited cases, e.g., (chen2015online; chen2016using); however, in general, the extensions to noisy predictions are difficult and, when possible, have confirmed the insights initially proven in models with perfect predictions. In this work, we focus on the perfect prediction model. Given the challenges associated with providing results for non-convex costs, this is natural and necessary. However, we do intend to investigate extensions of these results to noisy predictions in future work.

3. The limited power of predictions

To this point, the literature studying online optimization with predictions has focused on positive results, i.e., providing algorithms that can achieve competitive ratios which converge to as the number of predictions available to the algorithm grows, e.g., (lin2012online; chen2016using; li2018online; shi2019value). However, all positive results that exist apply to only specific forms of hitting and switching costs. As a result, an important question that remains for the community is: Is it always possible for an online learner to leverage predictions to achieve a near-optimal competitive ratio?

In this section, we show that the answer is “no.” We show that there exist instances where predictions cannot guarantee the learner a near-optimal cost, even in the case when cost functions are convex. Further, the instances we construct are not strange corner cases, they include an important subclass of online optimization, Convex Body Chasing (CBC), which has received considerable attention in recent years, e.g., (argue2019nearly; sellke2019chasing).

In the following, we detail a construction that highlights a fundamental challenge for online optimization with predictions. In particular, all previous positive results rely on a condition such as strong convexity or local polyhedrality, which rules out convex hitting cost functions that are “flat” around the minimizer. Our construction shows that, if this condition is not satisfied, it is not guaranteed that predictions can be leveraged by the online learner.

Convex Body Chasing

Like online optimization, an instance of CBC proceeds with an online agent making decisions in a series of rounds. In each round, the agent is presented a closed convex set . After observing the convex body, the agent picks a point and pays the movement cost , where is a norm. Thus, the total cost incurred by the online agent is:

It is straightforward to see that CBC is a special case of SOCO. Simply define the hitting cost functions as the indicator function of the body . It is the form of this hitting cost function that creates a challenge for the learner. The fact that the slope of the hitting cost function within the body is means that the offline optimal can be anywhere in the body without paying a cost compared to the learner. This makes it much more difficult for the learner to match the cost of the offline optimal.

While it is easy to see that CBC is an instance of SOCO, it is perhaps surprising to discover that a SOCO problem can also be viewed as an instance of CBC. Specifically, if there exists an algorithm that can solve the CBC problem in dimensional space, we can construct an algorithm that can solve the SOCO problem in dimensional space. This fact was first noted in (antoniadis2016chasing), but the result was retracted. Then, it was noted without a formal proof in (bubeck2019improved). Here we provide a formal statement and proof of the reduction since the result is crucial to our goal of highlighting limitations on the power of predictions in online convex optimization.

Proposition 1 ().

Consider a dimensional SOCO problem where the movement cost function is given by . Suppose Algorithm is -competitive algorithm for CBC in dimensions with movement cost function . Then, there exists a -competitive algorithm for the dimensional SOCO problem.

Given Proposition 1, we can obtain insight on the limitations of the power of predictions in SOCO by studying the power of predictions in CBC. The following theorem shows that it is not possible to use predictions to obtain a near-optimal competitive ratio in CBC.

Theorem 2 ().

Consider an instance of CBC in -dimensional space with movement cost function , where is an arbitrary norm. Suppose is a lower bound on the competitive ratio for all algorithms when the length of the prediction window is . For any , the same lower bound holds for any algorithm with a prediction window of length .

To interpret this theorem, we make note of an important lower bound in the SOCO literature. In particular, any online algorithm that can only see the current cost function (i.e., has ) has competitive ratio when the movement costs are given by the standard norm, (chen2018smoothed). Thus, Theorem 2 implies that any online algorithm for CBC with a finite prediction window has competitive ratio lower bounded by .

Proof of Theorem 2.

Suppose an algorithm A can leverage a prediction window of length and achieve a competitive ratio of . It suffices to give an algorithm which only needs a prediction window of length but achieves the same competitive ratio with .

In the proof, we construct algorithm using as an oracle. When a convex body arrives at timestep , we duplicate it times and feed these convex bodies in a sequence to algorithm . Specifically, at timestep , we construct convex bodies and feed the sequence to . is provided with a prediction window with length and is required to chase convex bodies in the order:

We call this duplicated convex body chasing game with prediction to distinguish it with the original game in which the learner only needs to chase convex bodies in the order . For convenience, we use to denote the total cost incurred by algorithm in instance . We use to denote the offline optimal cost in instance .

As a result of the competitive ratio guarantee for , we have that


Suppose the sequence of points picked by in instance is

We instruct to pick in instance . Notice that only looks at when it picks in instance . Since the bodies are just duplicates of body , is an online algorithm with prediction window in game . It follows that


because, by the triangle inequality, we have that

On the other hand, since , the offline optimal in game can pick . Since the offline optimal will not waste movement in duplicated bodies, if the offline optimal of instance picks , picking will achieve the optimal cost in instance . Therefore, we have


Combining (1), (2), and (3), we obtain that

Therefore, algorithm has a competitive ratio of in instance . By the assumption made in the theorem, we see that

which completes the proof. ∎

4. When do predictions help?

The construction in the previous section highlights that it is not always possible for an online learner to leverage predictions to obtain near optimal cost, even in the case when costs are convex. However, the positive results in the prior literature show that predictions can be powerful in many specific settings. Thus, a crucial question is: Under what general conditions is it possible for an online learner to leverage predictions to acheive near optimal cost?

In this section, we introduce two general sufficient conditions that are motivated by the construction in the previous section and which ensure that the online learner can leverage predictions to achieve near-optimal cost. Additionally, we present a new algorithm, Synchronized Fixed Horizon Control (SFHC), that can leverage predictions to achieve near optimal cost when these conditions hold. We then analyze SFHC in the sections that follow.

4.1. Sufficient Conditions

While there are many positive results in the literature that highlight specific conditions where it is possible for online algorithms to leverage predictions to achieve near-optimal cost, general sufficient conditions have not been presented. Here, we introduce sufficient conditions that are general enough to contain many of the specific assumptions in previous results as special cases and that apply beyond online convex optimization to online non-convex optimization. Formally, the sufficient conditions we identify are the following.

Condition I: Order of Growth.:

The hitting costs and movement cost satisfy where is a global minimum of .

Condition II: Approximate Triangle Inequality.:

The movement cost satisfies

The first condition ensures that the hitting cost functions are not too flat around the minimizer . This is useful since it helps limit the area where the offline optimal solution can be. Notice that the need for a condition of this type is motivated by the analysis of CBC in Section 3 and it is interesting to see that many previous papers in online convex optimization have assumptions that are special cases of this condition, e.g., (chen2018smoothed; goel2018smoothed; goel2019beyond). The second condition is an approximate form of the triangle inequality. Intuitively, without such a condition the cost for an online learner to “catch up” after making a mistake by moving in the wrong direction could be arbitrarily large, which would make it impossible to track the offline optimal in a way that maintains a constant competitive ratio.

We would like to emphasize the generality of these conditions. They capture many settings where previous papers have focused. For example, the case of polyhedral hitting costs that was studied in (chen2018smoothed) corresponds to and . Similarly, the case of strongly convex hitting costs studied in (goel2018smoothed; goel2019beyond) corresponds to and . Finally, the setting of geographical load balancing across data centers studied in (lin2012online; lin2013dynamic; shi2019value) corresponds to and , where is the cost of running different kinds of servers and is the cost of starting different kinds of servers. (This connection is not immediately obvious and so a proof is provided in Appendix C).

4.2. Synchronized Fixed Horizon Control

In order to show that the two conditions above are sufficient to allow a learner to leverage predictions, we introduce a new algorithm, Synchronized Fixed Horizon Control (SFHC), which we show has a competitive ratio whenever the sufficient conditions hold – regardless of whether hitting costs are convex or non-convex. Thus, our results for SFHC apply more broadly than those for any existing algorithms. Further, our results show that SFHC achieves (nearly) the same performance bound as previous algorithms in settings where they do apply.

SFHC is a variant of Averaging Fixed Horizon Control (AFHC), which was proposed in (lin2012online) and has served as the basis for a number of improved algorithms in recent years, e.g., (chen2016using; shi2019value). Like AFHC, SFHC works by combining the trajectories determined by different subroutines (Algorithm 1). It either combines them deterministically (by averaging them) or in a randomized manner, leading to two variations: Deterministic SFHC (Algorithm 2) and Randomized SFHC (Algorithm 3). Deterministic SFHC is sufficient for the case of convex hitting costs, but randomization is valuable when costs are non-convex. That randomization helps is perhaps surprising given that it has been proven that randomization does not help in smoothed online convex optimization (bansal20152).

To explain the workings of SFHC, we start with the case of , i.e., where the online agent sees only the current cost function. In this case, SFHC is “greedy” and picks . This is a simple approach that is not optimal but, remarkably, is still competitive in many situations when Conditions I and II hold. To understand why, consider an online agent whose goal is to choose to track the offline optimal point , instead of simply minimizing costs. It is impossible to exactly know ahead of time, even with predictions. However, the offline optimal point cannot be too far away from the minimizer , since then it would incur significant costs; we can hence think of as an “anchor” that keeps us close to the offline optimal. The Order of Growth condition controls exactly how close the offline point must be to ; the steeper the hitting costs, the more incentive for the offline optimal to stay close to . The approximate triangle inequality property helps to bound the discrepancy caused by the fact that is only a proxy for

, not its exact location. In particular, the approximate triangle inequality immediately gives the estimate


The key idea in SFHC is to periodically “sync up” with the greedy algorithm while simultaneously exploiting predictions. This guarantees that our solution cannot wander too far from the offline trajectory. This reasoning suggests the basic structure of the algorithm; at timestep , we choose the sequence of points


where is a global minimizer of . The constraint in (4) is what leads to name Synchronized Fixed Horizon Control. It ensures that at the end of each fixed horizon the algorithm is constrained to make the greedy choice, i.e. it is periodically “synchronized” with the greedy algorithm which attempts to track the offline optimal solution. Clearly, this is strictly better than picking for all since it can use predictions to optimize among the trajectories that end at .

Finally, as in AFHC, our algorithm maintains different subroutines performing the optimization in (4) separately and then combines them together to pick a point at time .

We now describe the SFHC algorithm more formally. The core piece of both Deterministic SFHC and Randomized SFHC is the SFHC subroutine. To define it, we need to introduce some notation first. Define and function () as:


where the variable is , are defined as the minimizers . We can view as the total cost incurred between timestep and , given the fixed choices of and . Since the head and the tail of the subsequence of decision points are fixed to be the minimizers at the corresponding timesteps, only can change freely in . In general, will minimize the function for , except perhaps in the last prediction window, which may overshoot the end of the sequence of functions. To take this case into consideration, we need to extend the definition of function to include the case . If , is defined as


where the variable is and is a fixed constant. Given this notation, is formally defined in Algorithm 1.

To analyze , it is useful to formulate it as an offline optimization. In particular, in phase , outputs the solution of


It can be easily verified that this offline optimization can be implemented as an online algorithm with a prediction window .

Using, SFHC, we can now define Deterministic SFHC. In short, Deterministic SFHC averages the decisions of the SFHC subroutines with equal weighting; see Algorithm 2 for the full details. Notice that when Deterministic SFHC is required to commit , hitting costs have been revealed, so is able to decide its choice .

Randomized SFHC is a variation of Deterministic SFHC which picks one of the subroutines to follow uniformly at random instead of averaging the choices of the subroutines. We note that this choice is made exactly once, before the algorithm is run; Randomized SFHC does not resample from the subroutines once its initial random choice is made. See Algorithm 3 for details.

1:if  then
2:     Pick .
3:     Pick .
4:for  do
5:     if  then
6:         if  then
7:              Pick .
8:              Pick .
9:         else
10:              Pick .               
Algorithm 1 SFHC with phase : SFHC(h)
1:for  do
2:     Suppose is the point picked by at timestep .
3:     Commit .
Algorithm 2 Deterministic SFHC
1:Choose uniform randomly from .
2:Run to determine .
Algorithm 3 Randomized SFHC (Version A)

5. Convex hitting costs

To show that the conditions presented in Section 4 are sufficient for SFHC to be competitive, we start by focusing on the case of convex costs. While our goal in the paper is to provide results for non-convex costs, we present the convex setting first because the structure of the analysis in the convex case serves as the basis of the proofs in the non-convex case, with the bulk of the proof applying to both the convex and non-convex cases. The proof in the convex case thus highlights exactly where additional complexity is needed for the analysis of non-convex costs.

The following theorem highlights that, in the convex setting, SFHC achieves a competitive ratio that matches the order of the best known bounds for many previously known algorithms, such as Online Balanced Descent (see (chen2018smoothed; goel2018smoothed)) and Averaging Fixed Horizon Control (see (lin2012online)), while applying more generally than any previous algorithm.

Theorem 1 ().

Consider an online optimization problem with movement and hitting costs that satisfy Conditions I and II. Suppose the hitting cost functions and the movement cost function are convex.

  1. Deterministic SFHC has a competitive ratio of when it has access to predictions.

  2. Deterministic SFHC has a competitive ratio of
    when it has access to predictions.

To provide context for this result it is useful to compare it to the special cases that have been studied previously in the literature. There are two key contrasts with the previous literature. First, SFHC is the first algorithm that can both provide competitive bounds in the case of no predictions () and the case where predictions are available (). To this point, algorithms that are designed for predictions are not competitive in settings where predictions where not available (e.g. AFHC (lin2012online)) and algorithms designed to be competitive in the case where predictions are not available (e.g. OBD (chen2018smoothed; goel2018smoothed)) are not able to make use of predictions.

Second, SFHC provides a competitive bound in settings much more general than previous results. To highlight this, a first example is the polyhedral problem setting considered in (chen2018smoothed), which corresponds to setting . In this setting, the competitive ratio for Online Balanced Descent (OBD) proved in (chen2018smoothed) is for . Theorem 1 provides a strictly strongly and more general result. It guarantees that Deterministic SFHC is -competitive for all . Another important setting that has previously received attention is the the strongly convex problem setting considered in (goel2018smoothed; goel2019beyond), which corresponds to . In this context, the best known competitive ratio is , achieved by Regularized Online Balanced Descent (ROBD) in the context of (goel2019beyond). While our result does not match the performance of ROBD for the case of , the competitive ratio of Deterministic SFHC given by Theorem 1 is when and, to the best of our knowledge, this is the best known result in this setting. Importantly, the previous algorithms and analysis in both of these settings are tuned to the details of the setting and do not apply more broadly. In contrast, the bound on the competitive ratio of Deterministic SFHC applies much more generally, i.e., whenever Conditions I and II hold.

We end this section by proving Theorem 1. The bulk of our proof does not require the assumption that hitting and movement costs are convex. This is important because it means that a large fraction of the argument can be used in the context of non-convex costs, which is our focus in Section 6. To highlight this fact, we organize the proof into a set of lemmas, and then apply these lemmas to prove the theorem.

Our first lemma focuses on Case (i) in Theorem 1, i.e., when . The result does not require convexity.

Lemma 0 ().

Consider an online optimization problem with movement and hitting costs that satisfy Conditions I and II. Deterministic SFHC has a competitive ratio of when it has access to predictions.


Since SFHC picks the minimizer of hitting cost function at timestep , the hitting cost incurred by the online agent at timestep is and the movement cost is . We can upper bound in the following two symmetric ways by the Approximate Triangle Inequality (Condition II):




Adding (8) and (9) together, we obtain that

Recalling that is the global minimum of we obtain that


Next, summing (10) over timesteps , we can compute


where we use Condition II in (11a). ∎

Next, we prove a lemma that bounds the average total cost across the subroutines of SFHC. Again, this lemma does not require the assumption of convexity. Thus, it serves as the basis not just for the result in Theorem 1, but also for our analysis in Section 6 of non-convex costs.

Lemma 0 ().

Consider an online optimization problem with movement and hitting costs that satisfy Conditions I and II. The average total cost of the subroutines of SFHC, i.e., the arithmetic mean of
, is upper bounded by

given access to predictions.

Figure 1. Illustration of the Proof of Theorem 1. By the definition of , we know the blue point sequence (picked by ) is better than the black path (offline optimal solution with substituted by , for all ). The task is to compare the black path with the red point sequence (the actual offline optimal).
Proof of Lemma 3.

To begin, we define some notation. For a point sequence , we use to denote ’s subsequence Also, recall that is the total cost of the sequence . We use and to denote the offline optimal hitting cost and movement cost incurred at timestep . Finally, recall that we can formulate as an offline optimization


In general, breaks the sequence of actions up into subsequences of length by fixing for . It is able to minimize each subsequence separately because they are independent from each other due to the synchronization enforced by the algorithm. However, the first and the last subsequences need additional attention because their length may be less than .

Recall that (defined in Algorithm 1) selects the minimizers of function as its choice for and if . The definition of the function can be found in Section 4. For all , since are the minimizers of , we have that


where is the solution picked by and is the offline optimal solution. Similarly, we also have that


Summing up (13) over together with (14), we can upper bound the total cost of by


Equation (15) highlights that if we substituted by for all in the offline optimal solution, the resulting sequence (which satisfies the synchronization constraint) is no better than the solution picked by . However, the actual offline optimal is not subject to the synchronization constraint.

Now we compare the upper bound in (15) with the actual offline optimal cost. An illustration is given in Figure 1. We see that


Since , we have


Conditions I and II guarantee that for all




Substituting (17), (18) and (19) into (16), we see that


where we assume without loss of generality.

Recall that we use to denote the point sequence picked by . The averaging cost incurred by all subroutines satisfies that