# Smoothed Online Optimization for Regression and Control

We consider Online Convex Optimization (OCO) in the setting where the costs are m-strongly convex and the online learner pays a switching cost for changing decisions between rounds. We show that the recently proposed Online Balanced Descent (OBD) algorithm is constant competitive in this setting, with competitive ratio 3 + O(1/m), irrespective of the ambient dimension. Additionally, we show that when the sequence of cost functions is ϵ-smooth, OBD has near-optimal dynamic regret and maintains strong per-round accuracy. We demonstrate the generality of our approach by showing that the OBD framework can be used to construct competitive algorithms for a variety of online problems across learning and control, including online variants of ridge regression, logistic regression, maximum likelihood estimation, and LQR control.

There are no comments yet.

## Authors

• 7 publications
• 20 publications
• ### Beyond Online Balanced Descent: An Optimal Algorithm for Smoothed Online Optimization

We study online convex optimization in a setting where the learner seeks...
05/29/2019 ∙ by Gautam Goel, et al. ∙ 0

• ### Smoothed Online Convex Optimization in High Dimensions via Online Balanced Descent

We study Smoothed Online Convex Optimization, a version of online convex...
03/28/2018 ∙ by Niangjun Chen, et al. ∙ 0

• ### Online Optimization with Predictions and Non-convex Losses

We study online optimization in a setting where an online learner seeks ...
11/10/2019 ∙ by Yiheng Lin, et al. ∙ 0

• ### Revisiting Smoothed Online Learning

In this paper, we revisit the problem of smoothed online learning, in wh...
02/13/2021 ∙ by Lijun Zhang, et al. ∙ 0

• ### Online Learning of Power Transmission Dynamics

We consider the problem of reconstructing the dynamic state matrix of tr...
10/27/2017 ∙ by Andrey Y. Lokhov, et al. ∙ 0

• ### Online Multiserver Convex Chasing and Optimization

We introduce the problem of k-chasing of convex functions, a simultaneou...
04/15/2020 ∙ by Sébastien Bubeck, et al. ∙ 0

• ### Sparse Representations of Positive Functions via Projected Pseudo-Mirror Descent

We consider the problem of expected risk minimization when the populatio...
11/13/2020 ∙ by Abhishek Chakraborty, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In this paper we study the problem of smoothed online convex optimization (SOCO), a variant of OCO where the online learner incurs a switching cost for changing its actions between rounds. More concretely, the online learner plays a series of rounds

. In each round, the learner receives a convex loss function

, picks a point from a convex action space , and pays a hitting cost as well as a switching cost which penalizes the learner for changing its action between rounds.

This problem was first introduced in the context of the dynamic management of service capacity in data centers [LWAT11], where the switching costs represent the performance and wear-and-tear costs associated with changing server configurations. Since then, SOCO has attracted considerable interest, both theoretical and applied, due to its use in dozens of applications across learning, distributed systems, networking, and control, such as speech animation [KYTM15], video streaming [JdV12], management of electric vehicle charging [KG14], geographical load balancing [LLWA12], and multi-timescale control [GCW17]. See [CGW18] for an extensive list of applications.

Unfortunately, despite a large and growing literature, all existing results identifying competitive algorithms for SOCO either (i) place strong restrictions on the action space, (ii), place strong restrictions on the class of loss functions, or (iii) require algorithms to make use of predictions of future cost functions. For example, a series of papers [LWAT11], [BGK15] developed competitive algorithms for one-dimensional action spaces. Until earlier this year there were no known algorithms that were competitive for SOCO beyond one dimension without requiring the use of predictions. Finally, [CGW18]

presented the first algorithm that is constant-competitive beyond one dimension, but the algorithm was shown to be constant competitive only in the case of polyhedral cost functions, a restrictive class that does not include most loss functions used in machine learning. Beyond this result, the most general positive results all assume predictions of future cost functions are available, e.g.

[LLWA12], [CAW15], [CCL16], [LQL18].

The existing work on SOCO highlights a crucial open question: Does there exist a competitive algorithm for high-dimensional SOCO problems with cost functions that capture standard losses for online learning problems, e.g., logistic loss or least-squares loss?

In this paper we answer this question by proving that the recently introduced Online Balanced Descent (OBD) algorithm is constant-competitive for SOCO with strongly convex costs. Additionally, highlighting the importance of the class of strongly convex costs, we show that the OBD framework can be used to construct the first competitive algorithms for problems as diverse as online ridge regression, online logistic regression, and LQR control, which was not possible with previous approaches.

Contributions of this paper. This paper makes three main contributions to the literature on SOCO.

First, in Section 3 we show that OBD is constant competitive for SOCO when the costs are strongly convex (Theorem 1). This establishes OBD as the first constant competitive algorithm for strongly convex costs beyond one-dimensional action spaces. The key to our proof is a novel potential function argument that exploits essential properties of geometry. In particular, controlling the change in potential rests upon comparing the side lengths of certain triangles (see Figure 1), which can be done via the Law of Cosines.

Secondly, in Section 4 we adopt a beyond-worst-case perspective and show that when the sequence of cost functions does not vary too much between rounds, OBD guarantees per-step accuracy (Theorem 2), meaning that the point OBD picks is always near the current minimizer. This is attractive from a statistical perspective, since it lets us bound the loss in accuracy we incur in each round due to the effects of the switching costs. We also show that OBD has near-optimal dynamic regret, almost matching a lower bound of [LQL18]. Specifically, in Theorem 3 we show that OBD has dynamic regret , where is a parameter that controls how much the cost sequence varies across rounds.

Finally, in Section 5 we show novel applications of OBD to problems arising in statistics, learning, and control, including ridge regression, logistic regression, and maximum likelihood estimation. A highlight of this section is a reduction of LQR control to SOCO, giving the first competitive algorithm for LQR control (results for LQR control typically make strong distributional assumptions). We emphasize that none of these applications could be handled by previous work on SOCO, which highlights the importance of deriving a competitive bound for OBD in the strongly convex setting.

Related work. There is a vast literature on OCO; for a recent survey see [H16]. OCO with switching costs was first studied in the scalar setting in [LWAT11], which used SOCO to model dynamic right-sizing in data centers and gave a 3-competitive algorithm. In subsequent work, [BGK15] improved the competitive ratio to 2, also in the scalar setting. The first constant-competitive algorithm beyond one dimension was given in [CGW18], which introduced the OBD framework and showed that it was competitive for SOCO with polyhedral costs. The results in this paper highlight that OBD is also constant-competitive for strongly convex cost functions, a class that is particularly important for learning and control applications, and is wholly disjoint from the class of polyhedral cost functions when the minimizer of the cost function is zero.

A special case of SOCO is the Convex Body Chasing problem, first introduced in [FL93]. The connection between Convex Body Chasing and SOCO was observed in [ABN16]. A recent series of papers [BBE18], [ABC18] identified competitive algorithms in the setting where the bodies are nested.

Before this paper, the only class of SOCO problems for which positive results for strongly convex cost functions existed is when the learner had access to accurate predictions of future cost functions. The study of SOCO with predictions began with [LLWA12] and then continued with a stream of work in the following years, e.g., [CAW15, CCL16]. The most relevant to this work is [LQL18], which shows a lower bound on the dynamic regret of SOCO with strongly convex cost functions; in Section 4 we show that OBD can almost match this lower bound.

In Section 5 we apply OBD to diverse problems like maximum likelihood estimation and LQR control. These problems have been widely studied; we refer the reader to [BV04] and [AM10] for a survey.

Finally, we note that SOCO can be viewed as a continuous version of the classic Metrical Task Systems (MTS) problem, one of the most widely studied problems in the online algorithms community, e.g [BLS92], [BBBT97], [BB00]. A special case of the MTS is the celebrated -server problem, first proposed in [MMS90], which has received significant attention in recent years, for example in [BCL18].

## 2 Smoothed Online Convex Optimization

An instance of SOCO consists of a convex action set , an initial point , a sequence of non-negative convex costs , and a non-negative function . In each round , the online learner observes the cost function , picks a point , and pays the sum of the hitting cost and the movement or switching cost . The switching cost acts as a regularizer, penalizing the online learner for changing its decisions between rounds. The goal of the online learner is to minimize its aggregate cost so as to approximate the offline optimal cost:

 minx1…xT∈χT∑t=1ft(xt)+c(xt,xt−1)

More generally, could be matrix-valued and could be functions on matrices. Note that we make no restrictions on the sequence of cost functions other than strong convexity; they could be adversarial, or even adaptively chosen to hurt the online learner.

We emphasize that SOCO differs from OCO in two important ways. Firstly, unlike in OCO, the costs incurred in each round of SOCO depend on the previous choice, coupling the online learner’s decisions across rounds. Secondly, the online learner can observe the cost function before picking . This is a standard assumption in the SOCO literature, e.g. in [BGK15], [LWAT11], [CGW18] and isolates the complexity of SOCO onto the coupling across timesteps due to the switching costs instead of the uncertainty in the costs.

In this paper, we measure the performance of OBD in terms of its dynamic regret and competitive ratio. The dynamic regret is defined as

 T∑t=1ft(xt)+c(xt,xt−1)−[T∑t=1ft(x∗t)+c(x∗t,x∗t−1)]

Here are the points picked by the online learner and are the offline optimal points. We note that this is a more natural performance metric for SOCO than static regret, since the main motivation for SOCO is to understand the effects of switching costs on online learning. In contrast, in the static regret setting the comparator never moves and hence incurs no switching cost, making it a less ideal performance metric for SOCO.

Instead of using an additive metric the competitive ratio uses a multiplicative metric:

 ∑Tt=1ft(xt)+c(xt,xt−1)∑Tt=1ft(x∗t)+c(x∗t,x∗t−1)

We note that [ABL13] showed that, in general, no online algorithm can have both sublinear static regret and constant competitive ratio.

Much attention has been focused on the setting where the switching cost is a norm: , e.g. [LWAT11], [BGK15]. Note that in the one-dimensional setting, all norms are identical, making the choice of norm somewhat vacuous. The first algorithm to work beyond the one dimensional setting was proposed in [CGW18], which considered a setting where the switching cost is given by the Euclidean distance and the loss functions are polyhedral, meaning that they at grow at least linearly as one moves away from the minimizer.

We instead focus on the setting where the cost functions are -strongly convex with respect to the Euclidean norm and the switching cost is quadratic:

 c(xt,xt−1)=12∥xt−xt−1∥22

In Section 5, we show that OBD can be used with many important loss functions, such as the least-squares loss and the regularized logistic loss, none of which could be handled by previous work.

We assume that the domain is all of . Note that this presents no real restriction, since we can always define for all . The objective becomes

 minx1…xT∈RdT∑t=1ft(xt)+12∥xt−xt−1∥22 (1)

Notation. We use to denote the norm. We often use and to denote the hitting cost and the movement cost , respectively. The offline costs and are defined analogously. We let denote the total cost incurred by OBD across all rounds and define to be the analogous offline cost. We let denote the minimizer of the cost function .

## 3 A Competitive Algorithm

Our main technical result shows that a recently proposed algorithm, Online Balanced Descent (OBD), is constant competitive for SOCO problems with strongly convex cost functions.

OBD was introduced in [CGW18], where it was analyzed for the class of polyhedral costs. The detailed workings of OBD are summarized in Algorithm 1. The key insight of OBD is to exploit the full geometry of the level sets of the current cost function when choosing the point in such a way as to take switching costs into account.

OBD works by iteratively projecting the previously chosen point onto a level set of the current cost function. The level set picked by OBD is the level set such that the switching cost incurred while traveling from to is equal to , where is the projection onto and is the balance parameter which can be tuned to get different performance guarantees. We note that OBD can be efficiently implemented via a binary search over the level sets [CGW18].

We can now state our main result, a bound on the competitive ratio of OBD for strongly convex costs.

###### Theorem 1.

OBD is competitive for the problem (1) for all . Furthermore, if is set to be , the competitive ratio of OBD is at most , irrespective of the ambient dimension.

We note that [CGW18] proved a bound on the competitive ratio of OBD of the form where measures the “steepness” of the costs. While this superficially resembles the bound in Theorem 1, we emphasize that the settings are quite different; their work applied to the class of polyhedral cost functions while we focus on strongly convex cost functions. In the case where the cost functions have minimum value zero these classes are wholly disjoint. We are led to consider strongly convex costs due to the fact many common learning and control problems have loss functions that are strongly convex (e.g., see Section 5). Until this paper, there existed no competitive algorithms for SOCO problems with strongly convex costs.

To prove Theorem 1, we use the potential function . Clearly and . Before we turn to the proof of Theorem 1, we prove a series of crucial lemmas relating to the potential function. Lemmas 1 and 2 show how the potential changes depending on the relative positions of its arguments, and highlight the role of the geometry associated with the norm. Lemma 3 relates the potential to the hitting costs at every timestep.

###### Lemma 1.

The change in potential satisfies

 ϕ(a,c)−ϕ(a,b)≤−ϕ(b,c)

for all such that the angle

between the vectors

and lies in .

###### Proof.

Consider the triangle with vertices . According to the Law of Cosines we have:

 ∥a−b∥2=∥a−c∥2+∥b−c∥2−2∥a−c∥∥b−c∥cosθ.

Rearranging gives

 ∥a−c∥2−∥a−b∥2=−∥b−c∥2+2∥a−c∥∥b−c∥cosθ.

Since lies in , the cosine term must be non-positive, immediately yielding the claim. ∎

###### Lemma 2.

The change in potential satisfies

 ϕ(a,c)−ϕ(a,b)≤2ϕ(b,c)+ϕ(a,b)

for all .

###### Proof.

We apply the Law of Cosines again:

 ∥a−c∥2−∥a−b∥2=∥b−c∥2−2∥a−b∥∥b−c∥cosθ

where is the angle between the vectors and . The second term on the right is at most ; applying the AM-GM inequality to this expression gives the claim. ∎

###### Lemma 3.

At all timesteps , the potential satisfies

 ϕ(xt,x∗t)≤4ηmHt+4ηmH∗t
###### Proof.

We have

 ϕ(xt,x∗t) = η∥xt−x∗t∥2 ≤ η(∥xt−vt∥+∥x∗t−vt∥)2 ≤ 2η∥xt−vt∥2+2η∥x∗t−vt∥2 ≤ 4ηmHt+4ηmH∗t

The first inequality is just the triangle inequality; the second follows from the AM-GM inequality; and in the last step we used the fact that . ∎

Now we return to the proof of Theorem 1. Note that it suffices to show that OBD is constant competitive in the case where minimum value of each cost function is zero, since otherwise the competitive ratio can only improve. In this case, we always have since shrinks to zero as we move towards the minimizer while increases.

###### Proof.

To bound the cost charged to OBD in each step, we first consider two cases.

### Case 1: Ht≤H∗t

This case is easy; the cost charged to OBD is

 Ht+Mt+Δϕ ≤ Ht+Mt+4ηmHt+4ηmH∗t ≤ (1+β+8ηm)H∗t

Here in the first inequality we threw away the negative potential term and used Lemma 3. In the second inequality we used the fact that and the inequality defining the case.

### Case 2: Ht>H∗t

This is the hard case. Unlike in the previous case, we cannot directly bound the cost charged to OBD in terms of the offline cost, since is less than . Our strategy will be to show that the change in potential was negative, offsetting the hitting and movement costs incurred by OBD.

Since , the offline point must lie strictly in the interior of , where is the -level set of . Notice that the angle made between the line segments and must be obtuse, since was the projection onto the level set, and

lies strictly on the opposite side of the supporting hyperplane tangent to

at (see Figure 1). We have

 Ht+Mt+Δϕ = Ht+Mt+(ϕ(xt,x∗t)−ϕ(xt−1,x∗t)) +(ϕ(xt−1,x∗t)−ϕ(xt−1,x∗t−1)) ≤ Ht+Mt−ϕ(xt,xt−1) +2ϕ(x∗t,x∗t−1)+ϕ(xt−1,x∗t−1) = Ht+Mt−ηMt+2ηM∗t +ϕ(xt−1,x∗t−1) ≤ (1+1β−η)Mt+2ηM∗t +4ηmHt−1+4ηmH∗t−1

In the first inequality we use Lemma 1 to bound the first change in potential and Lemma 2 to bound the second. In the second inequality we apply the fact that and Lemma 3.

### Bounding the competitive ratio

We have now bounded the cost charged to OBD in each of the two cases. Putting both cases together, we see that we always have

 Ht+Mt+Δϕ ≤ (1+β+8ηm)H∗t+2ηM∗t +4ηmHt−1+4ηmH∗t−1

where we assume that were picked so that . Adding up across all timesteps and collecting terms, we have:

 T∑t=1Ht+Mt ≤ T∑t=1(1+β+12ηm)H∗t +T∑t=12ηM∗t+T∑t=14ηmHt−1

By the balance condition, . We immediately obtain

 ALG ≤ max((1+β+12ηm),2η)OPT +4ηm(1+β)ALG

Let us assume that that are picked so that ; rearranging gives

 ALGOPT ≤ max((1+β+12ηm),2η)1−4ηm(1+β)

Now we seek to minimize the competitive ratio by appropriately picking , subject to the constraints. Notice that the competitive ratio is always increasing in . Since we know that we must have , this must be the optimal value of . We can hence immediately rewrite our optimization problem purely in terms of :

 minβ>4mmax((1+β+12m(1+1β),2(1+1β))1−4mβ

Note that this proves that OBD is competitive for all . Instead of trying to find the exact optimal solution, we instead select a simple choice of which gives a small competitive ratio. Setting immediately gives an upper bound on the competitive ratio of as claimed. ∎

## 4 Beyond-Worst-Case Analysis

In the previous section we showed a worst-case performance bound on OBD and proved that the aggregate cost incurred by OBD is not more than a constant times the optimal aggregate cost. However, in most real world scenarios, the cost functions are not adversarial, prompting us to study the performance of OBD from a beyond-worst-case perspective.

The difficulty in SOCO arises from the fact that the learner incurs switching costs in the face of cost functions which could change arbitrarily between rounds; yet in many practical settings the costs change slowly. This motivates the following definition:

###### Definition 1.

A sequence of points is -smooth if for all A sequence of convex functions is -smooth if the corresponding sequence of minimizers is -smooth.

Smooth instances have received considerable attention in the study of OCO, e.g. [LQL18], [Zin03]. Here, we show two interesting properties of OBD when the costs are -smooth.

First, we prove a per-round accuracy guarantee, showing that the point picked by OBD is always close to the minimizer :

###### Theorem 2.

Suppose is -smooth. Then the sequence of points picked by OBD is smooth with parameter where is the balance parameter of OBD and . Furthermore, the points chosen by OBD are always close to the current minimizer: for all .

This lets us bound the accuracy loss due to managing switching costs, guaranteeing that in each round we are not too far from the minimizer, despite the coupling across rounds. We note that when we use set we get . This gives an explicit numerical bound for the per-step accuracy when the balance parameter is set as in Theorem 1.

Secondly, we show that OBD incurs low dynamic regret when the costs are smooth:

###### Theorem 3.

Suppose is -smooth, and fix balance parameter . The dynamic regret of OBD is

We note that this result is nearly tight: Theorem 3 of [LQL18] implies that no online algorithm can have dynamic regret better than . We note that [CGW18] also proved a bound on the dynamic regret of OBD in terms of the smoothness of the cost sequence, but that bound grew super-linearly in in the worst case, even when is fixed. It is interesting to note that Theorem 3 holds only when , the same condition under which OBD is competitive.

We end this section with proofs of Theorems 2 and 3.

###### Proof.

By the balance condition and strong convexity, we have

 βm2∥xt−vt∥22≤12∥xt−xt−1∥22

Taking the square root of both sides and applying the triangle inequality gives

 √βm∥xt−vt∥≤∥xt−vt∥+∥vt−vt−1∥+∥vt−1−xt−1∥

from which

 ∥xt−vt∥≤ϵ+∥xt−1−vt−1∥√βm−1

Unraveling this recursion gives

 ∥xt−vt∥≤(t∑i=11(√βm−1)t−i+1)ϵ≤(1√βm−2)ϵ

where we summed the geometric series in the last step. Using the triangle inequality we immediately obtain

 ∥xt−xt−1∥≤(1+2α)ϵ

Now let us turn to the proof of Theorem 3. As is standard when proving regret bounds, we assume that the gradients of over the action set are uniformly bounded by a constant , e.g. see [Zin03].

###### Proof.

By convexity of , we have

 ft(xt)−ft(x∗t)≤∇ft(xt)⊤(xt−x∗t)

Let , . We have

 ∥xt+1−x∗t∥2=∥xt−ηt+1∇ft+1(xt+1)−x∗t∥2

from which

 ∥xt+1−x∗t∥22 ≤ ∥xt−x∗t∥22+η2t+1∥∇ft+1(xt+1)∥22 −2ηt+1∇f⊤t+1(xt−x∗t)

Applying the gradient bound and solving for , we see that it is at most

 ∥xt−x∗t∥2−∥xt+1−x∗t∥22ηt+1+ηt+12G2 ≤ ∥xt+1−x∗t∥+∥xt−x∗t∥2+ηt+12G2

where in the second inequality we factored the difference of squares and applied the triangle inequality. The regret is

 ALG−OPT = T∑t=1[ft(xt)−ft(x∗)+12η2t−12(η∗t)2] ≤ T∑t=1[∥xt−x∗t∥+G22ηt+1+η2t2]

where in the inequality we applied our bounds and threw away the offline’s movement cost. Noticing that allows to apply Theorem 2 and Lemma 4 to obtain the bound

 [αϵ+2mϵ+G22(1+2α)ϵ+12(1+2α)2ϵ2]T

where . This is as claimed. ∎

###### Lemma 4.

The offline points satisfy

 T∑t=1∥x∗t−vt∥≤2mT∑t=1∥vt−vt−1∥
###### Proof.

The first order condition is

 ∇ft(x∗t)+x∗t−x∗t−1+x∗t−x∗t+1 = 0

for , and for the last timestep is

 ∇fT(x∗T)+x∗T−x∗T−1 = 0

We add and subtract from the first set of equations and from the last equation, and right-multiply the resulting equations by the vectors and , respectively. Applying the Cauchy-Schwartz Inequality and strong convexity

 ∇ft(x)⊤(x−vt)≥m∥x−vt∥22

we eventually obtain

 δt(δt−1+δt+1+ϵt+ϵt+1) ≥(m+2)δ2t

for , and in the last timestep

 δT(δT−1+ϵT) ≥(m+1)δ2T,

where . Dividing both sides by and summing up over leads to the claim. ∎

## 5 Applications

In this section we show several applications of OBD to diverse problems across learning and control. We emphasize that none of these applications would be possible without a competitive algorithm for strongly convex costs, which has not been attainable with previous approaches.

### 5.1 Smoothed Online Regression

We consider the problem of a learner who wishes to fit a series of regularized regressors or classifiers to a changing dataset, without changing the estimators too much between rounds. This is naturally modeled by the objective

 minθ1…θT∈RdT∑t=1ft(θt)+λ12∥θt∥22+λ22∥θt−θt−1∥22 (2)

Here represents an estimation or classification task at timestep , is the regressor at time , and are parameters that control the strength of the and smoothing regularizations, respectively. We impose no constraint on other than convexity; in particular, need not be strongly convex (though if happens to be strongly convex, we can optionally drop the regularization term ). OBD gives a constant-competitive algorithm in this setting:

###### Corollary.

The competitive ratio of OBD with balance parameter on problem 2 is .

Before we turn to the proof, we emphasize that the bound on competitive ratio does not vary with respect to dimension; hence OBD can be applied to estimation problems with thousands or millions of parameters.

###### Proof.

We first divide the objective by . Notice that the function is -strongly convex in whenever is convex, hence Theorem 1 implies that OBD achieves competitive ratio . ∎

Our approach applies to many common learning problems:

• Ridge Regression. We take , where is a data matrix and

is the response variable.

• Logistic Regression. We take where and are binary outcomes. OBD hence fits a series of binary classifiers which don’t vary too much between rounds. Our approach easily extends to the multiclass setting as well.

• Maximum Likelihood Estimation. More generally, we can perform smoothed online maximum likelihood estimation using OBD. Here are parameters of a model and is the likelihood function of some dataset at time . If the likelihood function is convex then OBD can be applied. For example, the problem of estimating a series of covariance matrices

of a series of Gaussian distributions

given independent samples arranged as a matrix can be posed as the problem

 minΣTr(ΣY)−logdetΣ

which is a convex optimization problem (see [BV04], p. 357). We can apply OBD over the set of positive definite matrices to find a series of covariance matrices that fit the data well but don’t vary too much between rounds.

### 5.2 Linear Quadratic Regular (LQR) control

Our second application comes from the controls community. Consider the classical problem of LQR control:

 minu1…uTT∑t=112x′tQtxt+12u′tRut

with dynamics given by

 xt+1=Axt+But+wt

Here is a control action, a state variable, and are assumed to be positive definite. Usually, the noise increments are assumed to be i.i.d. Gaussian, and the goal is to design a control policy to minimize the expected cost. Instead of an in-expectation result, we can use OBD to design a controller with a strong pathwise guarantee, with no distributional or boundedness assumptions on the noise:

###### Corollary.

Suppose that are invertible and the matrices

each have their lowest eigenvalue bounded below by

. The LQR problem can be rewritten as a SOCO problem, and the competitive ratio of OBD is

 3+O(λmax(R)λmin(AB)2λ).

Note that can be interpreted as a lower bound on the gain of the control action ; intuitively, systems with high control gain are easier to regulate, since each control action gets amplified. Similarly, it is intuitive that as decreases the competitive ratio improves, since controls the cost incurred by using the controller.

###### Proof.

Define

 yt=t∑i=1At−iBui,vt=−t∑i=1At−iiwi.

Notice that

 xt=yt−vt,ut=(AB)−1(yt−yt−1),

so the LQR problem can be rewritten as

 miny1…yTT∑t=112(yt−vt)′Qt(yt−vt)+12(yt−yt−1)′Z(yt−yt−1)

where

 Z=(((AB)−1)′R(AB)−1)

Now define

 zt=R12(AB)−1yt,st=R12(AB)−1vt.

The optimization problem becomes

 minz1…zTT∑t=112(zt−st)′Pt(zt−st)+12∥zt−zt−1∥22

where

 Pt=(ABR−12)′QtABR−12

This is just a special case of the SOCO problem. Notice that the costs are strongly convex with parameter , which is bounded below by

 λmin(AB)2λλmax(R)

which in light of Theorem 1 proves the claim.

## 6 Concluding Remarks

We show in this paper that the OBD algorithm is constant-competitive algorithm for SOCO with strongly convex costs, making OBD the first competitive algorithm in this setting. We also show that when the sequence of cost functions is smooth OBD maintains good per-round accuracy and near-optimal dynamic regret. Finally, we apply OBD to a variety of important learning and control problems, including online maximum likelihood estimation and LQR control, giving the first constant competitive algorithms for these problems.

We conclude by identifying two important open problems in the area of smoothed online learning. First, it would be valuable if OBD were able to handle different kind of smoothing regularizers. For example, the smoothing regularizer

could be used to promote sparsity in the updates between rounds. Secondly, it is natural to extend the OBD framework to non-convex problems such as matrix completion and tensor factorization, which are increasingly popular in machine learning.

## References

• [ABC18] CJ Argue, Sébastien Bubeck, Michael B Cohen, Anupam Gupta, and Yin Tat Lee. A nearly-linear bound for chasing nested convex bodies. arXiv preprint arXiv:1806.08865, 2018.
• [ABL13] Lachlan Andrew, Siddharth Barman, Katrina Ligett, Minghong Lin, Adam Meyerson, Alan Roytman, and Adam Wierman. A tale of two metrics: Simultaneous bounds on competitiveness and regret. In Conference on Learning Theory, pages 741–763, 2013.
• [ABN16] Antonios Antoniadis, Neal Barcelo, Michael Nugent, Kirk Pruhs, Kevin Schewior, and Michele Scquizzato. Chasing convex bodies and functions. In Latin American Symposium on Theoretical Informatics, pages 68–81. Springer, 2016.
• [AM10] Karl Johan Aström and Richard M Murray. Feedback systems: an introduction for scientists and engineers. Princeton university press, 2010.
• [BB00] Avrim Blum and Carl Burch. On-line learning and the metrical task system problem. Machine Learning, 39(1):35–58, 2000.
• [BBBT97] Yair Bartal, Avrim Blum, Carl Burch, and Andrew Tomkins. A polylog (n)-competitive algorithm for metrical task systems. In

Proceedings of the twenty-ninth annual ACM symposium on Theory of computing

, pages 711–719. ACM, 1997.
• [BBE18] Nikhil Bansa, Martin Böhm, Marek Eliáš, Grigorios Koumoutsos, and Seeun William Umboh. Nested convex bodies are chaseable. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1253–1260. SIAM, 2018.
• [BCL18] Sebastien Bubeck, Michael B Cohen, Yin Tat Lee, James R Lee, and Aleksander Madry. k-server via multiscale entropic regularization. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 3–16. ACM, 2018.
• [BGK15] Nikhil Bansal, Anupam Gupta, Ravishankar Krishnaswamy, Kirk Pruhs, Kevin Schewior, and Cliff Stein. A 2-competitive algorithm for online convex optimization with switching costs. In LIPIcs-Leibniz International Proceedings in Informatics, volume 40, 2015.
• [BLS92] Allan Borodin, Nathan Linial, and Michael E Saks. An optimal on-line algorithm for metrical task system. Journal of the ACM (JACM), 39(4):745–763, 1992.
• [BV04] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.
• [CAW15] Niangjun Chen, Anish Agarwal, Adam Wierman, Siddharth Barman, and Lachlan LH Andrew. Online convex optimization using predictions. In ACM SIGMETRICS Performance Evaluation Review, volume 43, pages 191–204. ACM, 2015.
• [CCL16] Niangjun Chen, Joshua Comden, Zhenhua Liu, Anshul Gandhi, and Adam Wierman. Using predictions in online optimization: Looking forward with an eye on the past. ACM SIGMETRICS Performance Evaluation Review, 44(1):193–206, 2016.
• [CGW18] Niangjun Chen, Gautam Goel, and Adam Wierman. Smoothed online convex optimization in high dimensions via online balanced descent. In Proceedings of the 31st Conference On Learning Theory, volume 75 of Proceedings of Machine Learning Research, pages 1574–1594. PMLR, 2018.
• [FL93] Joel Friedman and Nathan Linial. On convex body chasing. Discrete & Computational Geometry, 9(3):293–321, 1993.
• [GCW17] Gautam Goel, Niangjun Chen, and Adam Wierman. Thinking fast and slow: Optimization decomposition across timescales. In Decision and Control (CDC), 2017 IEEE 56th Annual Conference on, pages 1291–1298. IEEE, 2017.
• [H16] Elad Hazan et al. Introduction to online convex optimization. Foundations and Trends® in Optimization, 2(3-4):157–325, 2016.
• [JdV12] V. Joseph and G. de Veciana. Jointly optimizing multi-user rate adaptation for video transport over wireless systems: Mean-fairness-variability tradeoffs. In 2012 Proceedings IEEE INFOCOM, pages 567–575, March 2012.
• [KG14] Seung-Jun Kim and Geogios B Giannakis. Real-time electricity pricing for demand response using online convex optimization. In Innovative Smart Grid Technologies Conference (ISGT), 2014 IEEE PES, pages 1–5. IEEE, 2014.
• [KYTM15] Taehwan Kim, Yisong Yue, Sarah Taylor, and Iain Matthews.

A decision tree framework for spatiotemporal sequence prediction.

In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 577–586. ACM, 2015.
• [LLWA12] M. Lin, Z. Liu, A. Wierman, and L. L. H. Andrew. Online algorithms for geographical load balancing. In 2012 International Green Computing Conference (IGCC), pages 1–10, June 2012.
• [LQL18] Yingying Li, Guannan Qu, and Na Li. Using predictions in online optimization with switching costs: A fast algorithm and a fundamental limit. In 2018 Annual American Control Conference (ACC), pages 3008–3013. IEEE, 2018.
• [LWAT11] M. Lin, A. Wierman, L. L. H. Andrew, and E. Thereska. Dynamic right-sizing for power-proportional data centers. In 2011 Proceedings IEEE INFOCOM, pages 1098–1106, April 2011.
• [MMS90] Mark S Manasse, Lyle A McGeoch, and Daniel D Sleator. Competitive algorithms for server problems. Journal of Algorithms, 11(2):208–230, 1990.
• [Zin03] Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), pages 928–936, 2003.