 # Understand Dynamic Regret with Switching Cost for Online Decision Making

As a metric to measure the performance of an online method, dynamic regret with switching cost has drawn much attention for online decision making problems. Although the sublinear regret has been provided in many previous researches, we still have little knowledge about the relation between the dynamic regret and the switching cost. In the paper, we investigate the relation for two classic online settings: Online Algorithms (OA) and Online Convex Optimization (OCO). We provide a new theoretical analysis framework, which shows an interesting observation, that is, the relation between the switching cost and the dynamic regret is different for settings of OA and OCO. Specifically, the switching cost has significant impact on the dynamic regret in the setting of OA. But, it does not have an impact on the dynamic regret in the setting of OCO. Furthermore, we provide a lower bound of regret for the setting of OCO, which is same with the lower bound in the case of no switching cost. It shows that the switching cost does not change the difficulty of online decision making problems in the setting of OCO.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

Online Algorithms (OA)111Some literatures denote OA by ‘smoothed online convex optimization’. (Chen et al., 2018, 2016; Renault and Rosén, 2012) and Online Convex Optimization (OCO) (Bubeck, 2011; Hazan, 2016; Shalev-Shwartz, 2012) are two important settings of online decision making. Methods in both OA and OCO settings are designed to make a decision at every round, and then use the decision as a response to the environment. Their major difference is outlined as follows.

• For every round, methods in the setting of OA are able to know a loss function first, and then play a decision as the response to the environment.

• However, for every round, methods in the setting of OCO have to play a decision before knowing the loss function. Thus, the environment may be adversarial to decisions of those methods.

Both of them have a large number of practical scenarios. For example, both the -server problem (Lee, 2018; Bansal et al., 2010) and the Metrical Task Systems (MTS) problem (Abernethy et al., 2010; Bubeck et al., 2019; Bansal et al., 2010) are usually studied in the setting of OA. Other problems include online learning (Sun et al., 2016; Yang et al., 2013; Wang et al., 2019; Li et al., 2018), online recommendation (Wang et al., 2016), online classification (Crammer et al., 2004; Bernstein et al., 2010), online portfolio selection (Li et al., 2013), and model predictive control (Morari and Lee, 1999) are usually studied in the setting of OCO.

Many recent researches begin to investigate performance of online methods in both OA and OCO settings by using dynamic regret with switching cost (Chen et al., 2018; Li et al., 2018). It measures the difference between the cost yielded by real-time decisions and the cost yielded by the optimal decisions. Comparing with the classic static regret (Bubeck, 2011), it has two major differences.

• First, it allows optimal decisions to change within a threshold over time, which is necessary in the dynamic environment222Generally, the dynamic environment means the distribution of the data stream may change over time..

• Second, the cost yielded by a decision consists of two parts: the operating cost and the switching cost, while the classic static regret only contains the operating cost.

The switching cost measures the difference between two successive decisions, which is needed in many practical scenarios such as service management in electric power network (Mookherjee et al., 2008), dynamic resource management in data centers (Lin et al., 2011; Lu et al., 2013; Wang et al., 2014). However, we still have little knowledge about the relation between the dynamic regret and the switching cost. In the paper, we are motivated by the following fundamental questions.

• Does the switching cost impact the dynamic regret of an online method?

• Does the problem of online decision making become more difficult due to the switching cost?

To answer those challenging questions, we investigate online mirror descent in settings of OA and OCO, and provide a new theoretical analysis framework. According to our analysis, we find an interesting observation, that is, the switching cost does impact on the dynamic regret in the setting of OA. But, it has no impact on the dynamic regret in the setting of OCO. Specifically, when the switching cost is measured by with , the dynamic regret for an OA method is where is the maximal number of rounds, and is the given budget of dynamics. But, the dynamic regret for an OCO method is , which is same with the case of no switching cost (György and Szepesvári, 2016; Zhao et al., 2018; Zinkevich, 2003; Hall and Willett, 2013). Furthermore, we provide a lower bound of dynamic regret, namely for the OCO setting. Since the lower bound is still same with the case of no switching cost (Zhao et al., 2018), it implies that the switching cost does not change the difficulty of the online decision making problem for the OCO setting. Comparing with previous results, our new analysis is more general than previous results. We define a new dynamic regret with a generalized switching cost, and provide new regret bounds. It is novel to analyze and provide the tight regret bound in the dynamic environment, since previous analysis cannot work directly for the generalized dynamic regert. In a nutshell, our main contributions are summarized as follows.

• We propose a new general formulation of the dynamic regret with switching cost, and then develop a new analysis framework based on it.

• We provide regret with for the setting of OA and regret for the setting of OCO by using the online mirror descent.

• We provide a lower bound regret for the setting of OCO, which matches with the upper bound.

The paper is organized as follows. Section 2 reviews related literatures. Section 3 presents the preliminaries. Section 4 presents our new formulation of the dynamic regret with switching cost. Section 5 presents a new analysis framework and main results. Section 6 presents extensive empirical studies. Section 7 conludes the paper, and presents the future work.

## 2. Related work

In the section, we review related literatures briefly.

### 2.1. Competitive ratio and regret

Although the competitive ratio is usually used to analyze OA methods, and the regret is used to analyze OCO methods, recent researches aim to developing unified frameworks to analyze the performance of an online method in both settings (Blum and Burch, 2000; Abernethy et al., 2010; Antoniadis et al., 2018; Buchbinder et al., 2012; Bubeck et al., 2018; Andrew et al., 2013; Chen et al., 2015). (Blum and Burch, 2000) provides an analysis framework, which is able to achieve sublinear regret for OA methods and constant competitive ratio for OCO methods. (Abernethy et al., 2010; Buchbinder et al., 2012; Bubeck et al., 2018) uses a general OCO method, namely online mirror descent in the OA setting, and improves the existing competitive ratio analysis for -server and MTS problems. Different from them, we extend the existing regret analysis framework to handle a general switching cost, and focus on investigating the relation between regret and switching cost. (Antoniadis et al., 2018) provides a lower bound for the OCO problem in the competitive ratio analysis framework, but we provide the lower bound in the regret analysis framework. (Andrew et al., 2013; Chen et al., 2015) study the regret with switching cost in the OA setting, but the relation between them is not studied. Comparing with (Andrew et al., 2013; Chen et al., 2015), we extend their analysis, and present a more generalized bound of dynamic regret (see Theorem 1).

### 2.2. Dynamic regret and switching cost

Regret is widely used as a metric to measure the performance of OCO methods. When the environment is static, e.g., the distribution of data stream does not change over time, online mirror descent yields regret for convex functions and regret for strongly convex functions (Bubeck, 2011; Hazan, 2016; Shalev-Shwartz, 2012). When the distribution of data stream changes over time, online mirror descent yields regret for convex functions (György and Szepesvári, 2016), where is the given budget of dynamics. Additionally, (Zinkevich, 2003) first investigates online gradient descent in the dynamic environment, and obtains regret (by setting ) for convex . Note that the dynamic regret used in (Zinkevich, 2003) does not contain swtiching cost. (Hall and Willett, 2013, 2015) use similar but more general definitions of dynamic regret, and still achieves regret. Furthermore, (Zhao et al., 2018) presents that the lower bound of the dynamic regret is . Many other previous researches investigate the regret under different definitions of dynamics such as parameter variation (Mokhtari et al., 2016; Yang et al., 2016; Gao et al., 2018; Zhang et al., 2017a), functional variation (Jenatton et al., 2016; Besbes et al., 2015; Zhang et al., 2018b), gradient variation (Chiang et al., 2012), and the mixed regularity (Jadbabaie et al., 2015; Chen et al., 2017). Note that the dynamic regret in those previous studies does not contain switching cost, which is significantly different from our work. Our new analysis shows that this bound is achieved and optimal when there is switching cost in the regret (see Theorems 2 and 3). The proposed analysis framework thus shows how the switching cost impacts the dynamic regret for settings of OA and OCO, which leads to new insights to understand online decision making problems.

## 3. Preliminaries

In the section, we present the preliminaries of online algorithms and online convex optimization, and highlight their difference. Then, we present the dynamic regret with switching cost, which is used to measure the performance of both OA methods and OCO methods.

### 3.1. Online algorithms and online convex optimization

Comparing with the setting of OCO (Shalev-Shwartz, 2012; Hazan, 2016; Bubeck, 2011), OA has the following major difference.

• OA assumes that the loss function, e.g., , is known before making the decision at every round. But, OCO assumes that the loss function, e.g., , is given after making the decision at every round.

• The performance of an OA method is measured by using the competitive ratio (Chen et al., 2018), which is defined by

 [∑Tt=1(ft(xt)+∥xt−xt−1∥)][∑Tt=1(ft(x∗t)+∥∥x∗t−x∗t−1∥∥)].

Here, is denoted by

where . is the given budget of dynamics. It is the best offline strategy, which is yielded by knowing all the requests beforehand (Chen et al., 2018). Note that is the switching cost yielded by at the -th round. But, OCO is usually measured by the regret, which is defined by

 T∑t=1ft(xt)−min{zt}Tt=1∈LTDT∑t=1ft(zt),

where . is also the given budget of dynamics. Note that the regret in classic OCO algorithm does not contain the switching cost.

To make it clear, we use Table 1 to highlight their differences.

### 3.2. Dynamic regret with switching cost

Although the analysis framework of OA and OCO is different, the dynamic regret with switching cost is a popular metric to measure the performance of both OA and OCO (Chen et al., 2018; Li et al., 2018). Formally, for an algorithm , its dynamic regret with switching cost is defined by

where . Here, represents the switching cost at the -th round. is the given budget of dynamics in the dynamic environment. When , all optimal decisions are same. With the increase of , the optimal decisions are allowed to change to follow the dynamics in the environment. It is necessary when the distribution of data stream changes over time.

### 3.3. Notations and Assumptions.

We use the following notations in the paper.

• The bold lower-case letters, e.g.,

, represent vectors. The normal letters, e.g.,

, represent a scalar number.

• represents a general norm of a vector.

• represents Cartesian product, namely, . has the similar meaning.

• Bregman divergence is defined by .

• represents a set of all possible online methods, and represents some a specific online method.

• represents ‘less than equal up to a constant factor’.

• represents the mathematical expectation operator.

Our assumptions are presented as follows. They are widely used in previous literatures (Li et al., 2018; Chen et al., 2018; Shalev-Shwartz, 2012; Hazan, 2016; Bubeck, 2011).

###### Assumption 1 ().

The following basic assumptions are used throughout the paper.

• For any , we assume that is convex, and has -Lipschitz gradient.

• The function is -strongly convex, that is, for any and , .

• For any and , there exists a positive constant such that

 max{BΦ(x,y),∥x−y∥2}≤R2.
• For any , there exists a positive constant such that

## 4. Dynamic regret with generalized switching cost

In the section, we propose a new formulation of dynamic regret, which contains a generalized switching cost. Then, we highlight the novelty of this formulation, and present the online mirror decent method for setting of OA and OCO.

### 4.1. Formulation

For an algorithm , it yields a cost at the end of every round, which consists of two parts: operating cost and switching cost. At the -th round, the operating cost is incurred by , and the switching cost is incurred by with . The optimal decisions are denoted by , which is denoted by

 {y∗t}Tt=1=\operatornamewithlimitsargmin{yt}Tt=1∈LTDT∑t=1ft(yt)+T−1∑t=1∥∥yt+1−yt∥∥σ.

Here, is denoted by

 LTD={{yt}Tt=1:T−1∑t=1∥∥yt+1−yt∥∥≤D}.

is a given budget of dynamics, which measures how much the optimal decision, i.e., can change over . With the increase of , those optimal decisions can change over time to follow the dynamics in the environment effectively.

Denote an optimal method , which yields the optimal sequence of decisions . Its total cost is denoted by

 cost(A∗)=T∑t=1ft(y∗t)+T−1∑t=1∥∥y∗t+1−y∗t∥∥σ.

Similarly, the total cost of an algorithm is denoted by

 cost(A)=T∑t=1ft(xt)+T−1∑t=1∥xt+1−xt∥σ.
###### Definition 1 ().

For any algorithm , its dynamic regret with switching cost is defined by

Our new formulation of the dynamic regret makes a balance between the operating cost and the switching cost, which is different from the previous definition of the dynamic regret in (Zinkevich, 2003; György and Szepesvári, 2016; Hall and Willett, 2013).

Note that the freedom of with allows our new dynamic regret to measure the performance of online methods for a large number of problems. Some problems such as dynamic control of data centers (Lin et al., 2012), stock portfolio management (Li and Hoi, 2014), require to be sensitive to the small change between successive decisions, and the switching cost in these problems is usually bounded by . But, many problems such as dynamic placement of cloud service (Zhang et al., 2012) need to bound the large change between successive decisions effectively, and the switching cost in these problems is usually bounded by .

### 4.2. Novelty of the new formulation

Our new formulation of the dynamic regret is more general than previous formulations (Chen et al., 2018; Li et al., 2018), which are presented as follows.

• Support more general switching cost. (Chen et al., 2018) defines the dynamic regret with switching cost by (1). It is a special case of our new formulation (2) by setting . The sequence of optimal decisions is dominated by and , and does not change over . is thus impacted by for the given and . Generally, is more sensitive to measure the slight change between and than . But, for some problems such as the dynamic placement of cloud service (Zhang et al., 2012), the switching cost at the -th round is usually measured by , instead of . The previous formulation in (Chen et al., 2018) is not suitable to bound the switching cost for those problems. Benefiting from , (2) supports more general switching cost than previous work.

• Support more general convex . (Li et al., 2018) defines the the dynamic regret with switching cost by

and they use to bound the regret. Here, . It implicitly assumes that the difference between and are bounded. It is reasonable for a strongly convex function , but may not be guaranteed for a general convex function . Additionally, (Li et al., 2018) uses to bound the switching cost, which is more sensitive to the significant change than . But, it is less effective to bound the slight change between them, which is not suitable for many problems such as dynamic control of data centers (Lin et al., 2012).

### 4.3. Algorithm

We use mirror descent (Beck and Teboulle, 2003) in the online setting, and present the algorithm MD-OA for the OA setting and the algorithm MD-OCO for the OCO setting, respectively.

As illustrated in Algorithms 1 and 2, both MD-OA and MD-OCO are performed iteratively. For every round, MD-OA first observes the loss function , and then makes the decision at the -th round. But, MD-OCO first makes the decision , and then observe the loss function . Therefore, MD-OA usually makes the decision based on the observed for the current round, but MD-OCO has to predict a decision for the next round based on the received .

Note that both MD-OA and MD-OCO requires to solve a convex optimizaiton problem to update . The complexity is dominated by the domain and the distance function . Besides, both of them lead to memory cost. They lead to comparable cost of computation and memory.

## 5. Theoretical analysis

In this section, we present our main analysis results about the proposed dynamic regret for both MD-OA and MD-OCO, and discuss the difference between them.

### 5.1. New bounds for dynamic regret with switching cost

The upper bound of dynamic regret for MD-OA is presented as follows.

###### Theorem 1 ().

Choose in Algorithm 1. Under Assumption 1, we have

That is, Algorithm 1 yields dynamic regret with switching cost.

###### Remark 1 ().

When , MD-OA yields dynamic regret, which achieves the state-of-the-art result in (Chen et al., 2018). When , MD-OA yields dynamic regret, which is a new result as far as we know.

However, we find different result for MD-OCO. The switching cost does not have an impact on the dynamic regret.

###### Theorem 2 ().

Choose in Algorithm 2. Under Assumption 1, we have

 sup{ft}Tt=1∈FTR\textscMD−OCOD≲√TD+√T.

That is, Algorithm 2 yields dynamic regret with switching cost.

###### Remark 2 ().

MD-OCO still yields dynamic regret (György and Szepesvári, 2016) when there is no switching cost. It shows that the switching cost does not have an impact on the dynamic regret.

Before presenting the discussion, we show that MD-OCO is the optimum for dynamic regret because the lower bound of the problem matches with the upper bound yielded by MD-OCO.

###### Theorem 3 ().

Under Assumption 1, the lower bound of the dynamic regret for the OCO problem is

###### Remark 3 ().

When there is no switching cost, the lower bound of dynamic regret for OCO is (Zhao et al., 2018). Theorem 3 achieves it for the case of switching cost. It implies that the switching cost does not let the online decision making in the OCO setting become more difficult.

### 5.2. Insights

Switching cost has a significant impact on the dynamic regret for the setting of OA. According to Theorem 1, the switching cost has a significant impact on the dynamic regret of MD-OA. Given a constant , a small leads to a strong dependence on , and a large leads to a weak dependence on . The reason is that a large leads to a large learning rate, which is more effective to follow the dynamics in the environment than a small learning rate.

Switching cost does not have an impact on the dynamic regret for the setting of OCO. According to Theorem 2 and Theorem 3, the dynamic regret yielded by MD-OCO is tight, and MD-OCO is the optimum for the problem. Although the switching cost exists, the dynamic regret yielded by MD-OCO does not have any difference.

As we can see, there is a significant difference between the OA setting and the OCO setting. The reasons are presented as follows.

• MD-OA makes decisions after observing the loss function. It has known the potential operating cost and switching cost for any decision. Thus, it can make decisions to achieve a good tradeoff between the operating cost and switching cost.

• MD-OCO make decisions before observing the loss function. It only knows the historical information and the potential switching cost, and does not know the potential operating cost for any decision at the current round. In the worst case, if the environment provides an adversary loss function to maximize the operating cost based on the decision played by MD-OCO, MD-OCO has to lead to regret even for the case of no switching cost (György and Szepesvári, 2016). Although the potential switching cost is known, MD-OCO cannot make a better decision to reduce the regret due to unknown operating cost.

## 6. Empirical studies

In this section, we evaluate the total regret and the regret caused by switching cost for settings of both OA and OCO by running online mirror decent. Our experiments show the importance of knowing loss function before making a decision.

### 6.1. Experimental settings

We conduct binary classification by using the logistic regression model. Given an instance

and its label , the loss function is

 f(x)=log(1+exp(−ya⊤x)).

In experiments, we let .

We test four methods, including MD-OA, i.e., Algorithm 1, and MD-OCO, i.e., Algorithm 2, online balanced descent (Chen et al., 2018) denoted by BD-OA in the experiment, and multiple online gradient descent (Zhang et al., 2017b) denoted by MGD-OCO in the experiment. Both MD-OA and BD-OA are two variants of online algorithm, and similarily both MD-OCO and MGD-OCO are two variants of online convex optimization. We test those methods on three real datasets: usenet1, usenet2, and . The distributions of data streams change over time for those datasets, which is just the dynamic environment as we have discussed. More details about those datasets and its dynamics are presented at: http://mlkd.csd.auth.gr/concept_drift.html.

We use the average loss to test the regret, because they have the same optimal reference points . For the -th round, the average loss is defined by

 1tt∑l=1log(1+exp(−ylA⊤lxl))average loss % caused by operating cost+1tt−1∑l=0∥xl+1−xl∥average loss caused by switching % cost,

where is the instance at the -th round, and is its label. Besides, we evaluate the average loss caused by operating cost separately, and denote it by OL. Similarly, SL represents the average loss caused by switching cost.

In experiment, we set . Since , , and

are usually not known in practical scenarios, the learning rate is set by the following heuristic rules. We choose the learning rate

for the -th iteration, where is a given constants by the following rules. First, we set a large value . Then, we iteratively adjust the value of by when cannot let the average loss converge. If the first appropriate can let the average loss converge, it is finally chosen as the optimal learning rate. We use the similar heuristic method to determine other parameters, e.g., the number of inner iterations in MGD-OCO. Finally, the mirror map function is for BD-OA.

### 6.2. Numerical results

As shown in Figure 1, both MD-OA and BD-OA are much more effcetive than MD-OCO and MGD-OCO to decrease the average loss during a few rounds of begining. Those OA methods yield much smaller average loss than OCO methods. The reason is that OA knows the loss function before making decision . But, OCO has to make decision before know the loss function. Benefiting from knowing the loss function , OA reduces the average loss more efffectively than OCO. It matches with our theoretical analysis. That is, Algorithm 1 leads to regret, but Algorithm 2 leads to regret. When , OA tends to lead to smaller regret than OCO. The reason is that OA knows the potential loss before playing a decision for every round. But, OCO works in an adversary environment, and it has to play a decision before knowing the potential loss. Thus, OA is able to play a better decision than OCO to decrease the loss. Additionally, we observe that both MD-OA and BD-OA reduce much more average loss than MD-OCO and MGD-OCO for a large , which validates our theoretical results nicely. It means that OA is more effective to reduce the switching cost than OCO for a large . Specifically, as shown in Figure 2, the average loss caused by switching cost of OA methods, i.e., MD-OA(SL), has unsignificant changes, but that of OCO methods, i.e., MD-OCO(SL), has remarkable increase for a large .

When handling the whole dataset, the final difference of switching cost between MD-OA and MD-OCO is shown in Figure 3. Here, the difference of switching cost is measured by using average loss caused by switching cost of MD-OCO minus corresponding average loss caused by switching cost of MD-OA. As we can see, it highlights that OA is more effective to decrease the switching cost. The superiority becomes significant for a large , which verifies our theoretical results nicely again.

## 7. Conclusion and future work

We have proposed a new dynamic regret with switching cost and a new analysis framework for both online algorithms and online convex optimization. We find that the switching cost significantly impacts on the regret yielded by OA methods, but does not have an impact on the regret yielded by OCO methods. Empirical studies have validated our theoretical result.

Moreover, the switching cost in the paper is measured by using the norm of the difference between two successive decisions, that is, . It is interesting to investigate whether the work can be extended to a more general distance measure function such as Bregman divergence or Mahalanobis distance . Specifically, if the Bregman divergence666See details in https://en.wikipedia.org/wiki/Bregman_divergence. is used, the switching cost is thus , where is a differentiable distance function. If the Mahalanobis distance777See details in https://en.wikipedia.org/wiki/Mahalanobis_distance. is used, the switching cost is thus , where is the given covariance matrix. We leave the potential extension as the future work.

Besides, our analysis provides regret bound for any given budget of dynamics . It is a good direction to extend the work in the parameter-free setting, where analysis is adaptive to the dynamics of environment. Some previous work such as (Zhang et al., 2018a) have proposed the adaptive online method and analysis framework. But, (Zhang et al., 2018a) works in the expert setting, not a general setting of online convex optimization. It is still unknown whether their method can be used to extend our analysis.

## 8. Acknowledgments

This work was supported by the National Key R & D Program of China 2018YFB1003203 and the National Natural Science Foundation of China (Grant No. 61672528, 61773392, and 61671463).

## References

• (1)
• Abernethy et al. (2010) Jacob Abernethy, Peter L. Bartlett, Niv Buchbinder, and Isabelle Stanton. 2010. A Regularization Approach to Metrical Task Systems. In Proceedings of the 21st International Conference on Algorithmic Learning Theory (ALT). Springer-Verlag, Berlin, Heidelberg, 270–284.
• Andrew et al. (2013) Lachlan Andrew, Siddharth Barman, Katrina Ligett, Minghong Lin, Adam Meyerson, Alan Roytman, and Adam Wierman. 2013. A Tale of Two Metrics: Simultaneous Bounds on Competitiveness and Regret. In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems. 329–330.
• Antoniadis et al. (2018) Antonios Antoniadis, Kevin Schewior, and Rudolf Fleischer. 2018. A Tight Lower Bound for Online Convex Optimization with Switching Costs. In Approximation and Online Algorithms. Springer International Publishing, Cham, 164–175.
• Bansal et al. (2010) Nikhil Bansal, Niv Buchbinder, and Joseph Naor. 2010. Metrical Task Systems and the K-server Problem on HSTs. In Proceedings of the 37th International Colloquium Conference on Automata, Languages and Programming.
• Beck and Teboulle (2003) Amir Beck and Marc Teboulle. 2003. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters 31, 3 (2003), 167 – 175.
• Bernstein et al. (2010) Andrey Bernstein, Shie Mannor, and Nahum Shimkin. 2010. Online Classification with Specificity Constraints. In Proceedings of Advances in Neural Information Processing Systems (NIPS), J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta (Eds.). 190–198.
• Besbes et al. (2015) Omar Besbes, Yonatan Gur, and Assaf J Zeevi. 2015. Non-Stationary Stochastic Optimization. Operations Research 63, 5 (2015), 1227–1244.
• Blum and Burch (2000) Avrim Blum and Carl Burch. 2000. On-line Learning and the Metrical Task System Problem. Machine Learning 39, 1 (Apr 2000), 35–58.
• Bubeck (2011) Sébastien Bubeck. 2011. Introduction to Online Optimization.
• Bubeck et al. (2019) Sébastien Bubeck, Michael B Cohen, James R Lee, and Yin Tat Lee. 2019. Metrical task systems on trees via mirror descent and unfair gluing. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA).
• Bubeck et al. (2018) Sébastien Bubeck, Michael B. Cohen, Yin Tat Lee, James R. Lee, and Aleksander Mkadry. 2018. K-server via Multiscale Entropic Regularization. In

Proceedings of the 50th Annual ACM Symposium on Theory of Computing (STOC)

. ACM, New York, NY, USA, 3–16.
• Buchbinder et al. (2012) Niv Buchbinder, Shahar Chen, Joshep (Seffi) Naor, and Ohad Shamir. 2012. Unified Algorithms for Online Learning and Competitive Analysis. In Proceedings of the 25th Annual Conference on Learning Theory (COLT), Shie Mannor, Nathan Srebro, and Robert C. Williamson (Eds.), Vol. 23. Edinburgh, Scotland, 5.1–5.18.
• Chen et al. (2015) Niangjun Chen, Anish Agarwal, Adam Wierman, Siddharth Barman, and Lachlan L.H. Andrew. 2015. Online Convex Optimization Using Predictions. In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems. 191–204.
• Chen et al. (2016) Niangjun Chen, Joshua Comden, Zhenhua Liu, Anshul Gandhi, and Adam Wierman. 2016. Using Predictions in Online Optimization: Looking Forward with an Eye on the Past. In Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science. 193–206.
• Chen et al. (2018) Niangjun Chen, Gautam Goel, and Adam Wierman. 2018. Smoothed Online Convex Optimization in High Dimensions via Online Balanced Descent. In Proceedings of the 31st Conference On Learning Theory (COLT), Vol. 75. 1574–1594.
• Chen et al. (2017) Tianyi Chen, Qing Ling, and Georgios B. Giannakis. 2017. An Online Convex Optimization Approach to Proactive Network Resource Allocation. IEEE Transactions on Signal Processing 65 (2017), 6350–6364.
• Chiang et al. (2012) Chao Kai Chiang, Tianbao Yang, Chia Jung Lee, Mehrdad Mahdavi, Chi Jen Lu, Rong Jin, and Shenghuo Zhu. 2012. Online Optimization with Gradual Variations. Journal of Machine Learning Research 23 (2012).
• Crammer et al. (2004) Koby Crammer, Jaz Kandola, and Yoram Singer. 2004. Online Classification on a Budget. In Proceedings of Advances in Neural Information Processing Systems (NIPS). 225–232.
• Gao et al. (2018) Xiand Gao, Xiaobo Li, and Shuzhong Zhang. 2018. Online Learning with Non-Convex Losses and Non-Stationary Regret. In

Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics (AISTATS)

, Amos Storkey and Fernando Perez-Cruz (Eds.), Vol. 84. 235–243.
• György and Szepesvári (2016) András György and Csaba Szepesvári. 2016. Shifting Regret, Mirror Descent, and Matrices. In Proceedings of the 33rd International Conference on Machine Learning (ICML). JMLR.org, 2943–2951.
• Hall and Willett (2013) Eric C Hall and Rebecca Willett. 2013. Dynamical Models and tracking regret in online convex programming.. In Proceedings of International Conference on International Conference on Machine Learning (ICML).
• Hall and Willett (2015) Eric C Hall and Rebecca M Willett. 2015. Online Convex Optimization in Dynamic Environments. IEEE Journal of Selected Topics in Signal Processing 9, 4 (2015), 647–662.
• Hazan (2016) Elad Hazan. 2016. Introduction to Online Convex Optimization. Foundations and Trends in Optimization 2, 3-4 (2016), 157–325.
• Jadbabaie et al. (2015) Ali Jadbabaie, Alexander Rakhlin, Shahin Shahrampour, and Karthik Sridharan. 2015. Online Optimization : Competing with Dynamic Comparators. In Proceedings of International Conference on Artificial Intelligence and Statistics (AISTATS). 398–406.
• Jenatton et al. (2016) Rodolphe Jenatton, Jim Huang, and Cedric Archambeau. 2016. Adaptive Algorithms for Online Convex Optimization with Long-term Constraints. In Proceedings of The 33rd International Conference on Machine Learning (ICML), Vol. 48. 402–411.
• Lee (2018) James R Lee. 2018. Fusible HSTs and the randomized k-server conjecture.. In Proceedings of the IEEE 59th Annual Symposium on Foundations of Computer Science.
• Li and Hoi (2014) Bin Li and Steven C. H. Hoi. 2014. Online Portfolio Selection: A Survey. Comput. Surveys 46, 3 (2014), 35:1–35:36.
• Li et al. (2013) Bin Li, Steven C. H. Hoi, Peilin Zhao, and Vivekanand Gopalkrishnan. 2013. Confidence Weighted Mean Reversion Strategy for Online Portfolio Selection. ACM Transactions on Knowledge Discovery from Data (TKDD) 7, 1 (March 2013), 4:1–4:38.
• Li et al. (2018) C. Li, P. Zhou, L. Xiong, Q. Wang, and T. Wang. 2018. Differentially Private Distributed Online Learning. IEEE Transactions on Knowledge and Data Engineering (TKDE) 30, 8 (Aug 2018), 1440–1453.
• Li et al. (2018) Yingying Li, Guannan Qu, and Na Li. 2018. Online Optimization with Predictions and Switching Costs: Fast Algorithms and the Fundamental Limit. arXiv.org (Jan. 2018). arXiv:math.OC/1801.07780v3
• Lin et al. (2011) M. Lin, A. Wierman, L. L. H. Andrew, and E. Thereska. 2011. Dynamic right-sizing for power-proportional data centers. In Proceedings of IEEE International Conference on Computer Communications (INFOCOMM). 1098–1106.
• Lin et al. (2012) Minghong Lin, Adam Wierman, Alan Roytman, Adam Meyerson, and Lachlan L.H. Andrew. 2012. Online Optimization with Switching Cost. SIGMETRICS Performance Evaluation Review 40, 3 (2012), 98–100.
• Lu et al. (2013) T. Lu, M. Chen, and L. L. H. Andrew. 2013. Simple and Effective Dynamic Provisioning for Power-Proportional Data Centers. IEEE Transactions on Parallel and Distributed Systems (TPDS) 24, 6 (June 2013), 1161–1171.
• Mokhtari et al. (2016) Aryan Mokhtari, Shahin Shahrampour, Ali Jadbabaie, and Alejandro Ribeiro. 2016. Online optimization in dynamic environments: Improved regret rates for strongly convex problems. In Proceedings of IEEE Conference on Decision and Control (CDC). IEEE, 7195–7201.
• Mookherjee et al. (2008) Reetabrata Mookherjee, Benjamin F. Hobbs, Terry Lee Friesz, and Matthew A. Rigdon. 2008. Dynamic oligopolistic competition on an electric power network with ramping costs and joint sales constraints. Journal of Industrial and Management Optimization 4, 3 (11 2008), 425–452.
• Morari and Lee (1999) Manfred Morari and Jay H. Lee. 1999. Model predictive control: past, present and future. Computers & Chemical Engineering 23, 4 (1999), 667 – 682.
• Renault and Rosén (2012) Marc P. Renault and Adi Rosén. 2012. On Online Algorithms with Advice for the k-Server Problem. In Approximation and Online Algorithms, Roberto Solis-Oba and Giuseppe Persiano (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 198–210.
• Shalev-Shwartz (2012) Shai Shalev-Shwartz. 2012. Online Learning and Online Convex Optimization. Foundations and Trends® in Machine Learning 4, 2 (2012), 107–194.
• Sun et al. (2016) Y. Sun, K. Tang, L. L. Minku, S. Wang, and X. Yao. 2016. Online Ensemble Learning of Data Streams with Gradually Evolved Classes. IEEE Transactions on Knowledge and Data Engineering (TKDE) 28, 6 (June 2016), 1532–1545.
• Wang et al. (2014) Hao Wang, Jianwei Huang, Xiaojun Lin, and Hamed Mohsenian-Rad. 2014. Exploring Smart Grid and Data Center Interactions for Electric Power Load Balancing. SIGMETRICS Performance Evaluation Review 41, 3 (Jan. 2014), 89–94.
• Wang et al. (2016) Liang Wang, Kuang-chih Lee, and Quan Lu. 2016. Improving Advertisement Recommendation by Enriching User Browser Cookie Attributes. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM). 2401–2404.
• Wang et al. (2019) M. Wang, C. Xu, X. Chen, H. Hao, L. Zhong, and S. Yu. 2019. Differential Privacy Oriented Distributed Online Learning for Mobile Social Video Prefetching. IEEE Transactions on Multimedia 21, 3 (March 2019), 636–651.
• Yang et al. (2013) Haiqin Yang, Michael R. Lyu, and Irwin King. 2013.

Efficient Online Learning for Multitask Feature Selection.

ACM Transactions on Knowledge Discovery from Data (TKDD) 7, 2 (Aug. 2013), 6:1–6:27.
• Yang et al. (2016) Tianbao Yang, Lijun Zhang, Rong Jin, and Jinfeng Yi. 2016. Tracking Slowly Moving Clairvoyant - Optimal Dynamic Regret of Online Learning with True and Noisy Gradient.. In Proceedings of the 34th International Conference on Machine Learning (ICML).
• Zhang et al. (2018a) Lijun Zhang, Shiyin Lu, and Zhi-Hua Zhou. 2018a. Adaptive Online Learning in Dynamic Environments. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). 1323–1333.
• Zhang et al. (2018b) Lijun Zhang, Tianbao Yang, rong jin, and Zhi-Hua Zhou. 2018b. Dynamic Regret of Strongly Adaptive Methods. In Proceedings of the 35th International Conference on Machine Learning (ICML). 5882–5891.
• Zhang et al. (2017a) Lijun Zhang, Tianbao Yang, Jinfeng Yi, Rong Jin, and Zhi-Hua Zhou. 2017a. Improved Dynamic Regret for Non-degenerate Functions. In Proceedings of Neural Information Processing Systems (NIPS).
• Zhang et al. (2017b) Lijun Zhang, Tianbao Yangt, Jinfeng Yi, Rong Jin, and Zhi-Hua Zhou. 2017b. Improved Dynamic Regret for Non-degenerate Functions. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 732–741.
• Zhang et al. (2012) Q. Zhang, Q. Zhu, M. F. Zhani, and R. Boutaba. 2012. Dynamic Service Placement in Geographically Distributed Clouds. In Proceedings of the IEEE 32nd International Conference on Distributed Computing Systems (ICDCS). 526–535.
• Zhao et al. (2018) Yawei Zhao, Shuang Qiu, and Ji Liu. 2018. Proximal Online Gradient is Optimum for Dynamic Regret. CoRR cs.LG (2018).
• Zinkevich (2003) Martin Zinkevich. 2003. Online Convex Programming and Generalized Infinitesimal Gradient Ascent. In Proceedings of International Conference on Machine Learning (ICML). 928–935.

## Proofs

###### Lemma 1 ().

Given any vectors , , , and a constant scalar , if

 ut+1=\operatornamewithlimitsargminu∈X⟨g,u−ut⟩+1λBΦ(u,ut),

we have

 ⟨g,ut+1−u∗⟩≤1λ(BΦ(u∗,ut)−BΦ(u∗,ut+1)−BΦ(ut+1,ut)).
###### Proof.

Denote , and . According to the optimality of , we have

 0≤h(uτ)−h(ut+1) = ⟨g,uτ−ut+1⟩+1λ(BΦ(uτ,ut)−BΦ(ut+1,ut)) = ≤ =

Thus, we have

 ⟨g,ut+1−u∗⟩≤1λ⟨∇Φ(ut)−Φ(ut+1),ut+1−u∗⟩ = 1λ(BΦ(u∗,ut)−BΦ(u∗,ut+1)−BΦ(ut+1,ut)).

It completes the proof. ∎

###### Lemma 2 ().

For any , we have

 (3) BΦ(y∗t+1,x)−BΦ(y∗t,x)≤2G∥∥y∗t+1−y∗t∥∥.
###### Proof.

According to the third-point identity of the Bregman divergence, we have

 BΦ(y∗t+1,x)−BΦ(y∗t,x) = ⟨∇Φ(y∗t+1)−∇Φ(x),y∗t+1−y∗t⟩−BΦ(y∗t,y∗t+1) \textcircled1≤ ⟨∇Φ(y∗t+1)−∇Φ(x),y∗t+1−y∗t⟩ ≤ ∥∥∇Φ(y∗t+1)−∇Φ(x)∥∥∥∥y∗t+1−y∗t∥∥ ≤ (∥∥∇Φ(y∗t+1)∥∥+∥∇Φ(x)∥)∥∥y∗t+1−y∗t∥∥ (4) ≤ 2G∥∥y∗t+1−y∗t∥∥.

holds because holds for any vectors and . It completes the proof. ∎

###### Lemma 3 ().

Given and , if , we have

 ∥xt−xt−1∥≤2Gγμ.
###### Proof.

holds due to is -strongly convex, and holds due to the optimality of . Thus,

That is,

 ∥xt−xt−1∥≤2Gγμ.

It completes the proof. ∎

Proof to Theorem 1:

###### Proof.
 ft(xt)−ft(y∗t) = ft(xt)−ft(xt−1)+ft(xt−1)−ft(y∗t) ≤ ft(xt)−ft(xt−1)+⟨^gt,xt−1−y∗t⟩ = ft(xt)−ft(xt−1)−⟨^gt,xt−xt−1⟩+⟨^gt,xt−y∗t⟩ \textcircled1≤ L2∥xt−1−xt∥2+⟨^gt,xt−y∗t⟩ \textcircled2≤ L2∥xt−1−xt∥2+1γ(BΦ(y∗t,xt−1)−BΦ(y∗t,xt)−BΦ(xt,xt−1)) \textcircled3≤ Lγ−μ2γ∥xt−1−xt∥2+1γ(BΦ(y∗t,xt−1)−BΦ(y∗t,xt)) (5) \textcircled4≤ 1γ(BΦ(y∗t,xt−1)−BΦ(y∗t,xt)).

holds because has -Lipschitz gradient. holds due to Lemma 1 by setting , , , , and . holds because that is -strongly convex, that is, . holds due to .

Thus, we have

 T∑t=1(ft(xt)−ft(y∗t)+∥xt−xt−1∥σ)−T∑t=1∥∥y∗t−y∗t−1∥∥σ ≤ T∑t=1(ft(xt)−ft(y∗t)+∥xt−xt−1∥σ) \textcircled1≤ T∑t=1∥xt−xt−1∥σ+1γT∑t=1(BΦ(y∗t,xt−1)−BΦ(y∗t,xt)) = \textcircled2≤ T∑t=1∥xt−xt−1∥σ+2GγT−1∑t=1∥∥y∗t+1−y∗t∥∥+1γ(BΦ(y∗1,x0)−BΦ(y∗T,xT)) ≤ T∑t=1∥xt−xt−1∥σ+2GγT−1∑t=1∥∥y∗t+1−y∗t∥∥+1γBΦ(y∗1,x0) ≤ T∑t=1∥xt−xt−1∥σ+2GDγ+R2γ \textcircled3≤ (2Gμ)σγσT+2GD+R2γ.

holds due to (5). holds due to

 BΦ(y∗t+1,xt)−BΦ(y∗t,xt)≤2G∥∥y∗t+1−y∗t∥∥

according to Lemma 2. holds due to Lemma 3.

Choose . We have

 T∑t=1(ft(xt)−ft(y∗t)+∥xt−xt−1∥σ)−T∑t=1∥∥y∗t−y∗t−1∥∥σ ≤ (2Gμ)σT1σ+1Dσσ+1+max{L(2GD+R2)μ,T1σ+1(2GDσσ+1+R2D−1σ+1)} ≲ T1σ+1Dσσ+1+T1σ+1D−1σ+1.

Since it holds for any seqence , we finally obtain