The Interplay Between Stability and Regret in Online Learning

This paper considers the stability of online learning algorithms and its implications for learnability (bounded regret). We introduce a novel quantity called forward regret that intuitively measures how good an online learning algorithm is if it is allowed a one-step look-ahead into the future. We show that given stability, bounded forward regret is equivalent to bounded regret. We also show that the existence of an algorithm with bounded regret implies the existence of a stable algorithm with bounded regret and bounded forward regret. The equivalence results apply to general, possibly non-convex problems. To the best of our knowledge, our analysis provides the first general connection between stability and regret in the online setting that is not restricted to a particular class of algorithms. Our stability-regret connection provides a simple recipe for analyzing regret incurred by any online learning algorithm. Using our framework, we analyze several existing online learning algorithms as well as the "approximate" versions of algorithms like RDA that solve an optimization problem at each iteration. Our proofs are simpler than existing analysis for the respective algorithms, show a clear trade-off between stability and forward regret, and provide tighter regret bounds in some cases. Furthermore, using our recipe, we analyze "approximate" versions of several algorithms such as follow-the-regularized-leader (FTRL) that requires solving an optimization problem at each step.

Authors

• 6 publications
• 51 publications
• 47 publications
• Competing With Strategies

We study the problem of online learning with a notion of regret defined ...
02/12/2013 ∙ by Wei Han, et al. ∙ 0

• Stability Conditions for Online Learnability

Stability is a general notion that quantifies the sensitivity of a learn...
08/16/2011 ∙ by Stéphane Ross, et al. ∙ 0

• Online Learning with Imperfect Hints

We consider a variant of the classical online linear optimization proble...

• A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds

Recently, much work has been done on extending the scope of online learn...
09/08/2017 ∙ by Pooria Joulani, et al. ∙ 0

• Fully Implicit Online Learning

Regularized online learning is widely used in machine learning. In this ...
09/25/2018 ∙ by Chaobing Song, et al. ∙ 8

• Online convex optimization and no-regret learning: Algorithms, guarantees and applications

Spurred by the enthusiasm surrounding the "Big Data" paradigm, the mathe...
04/12/2018 ∙ by E. Veronica Belmega, et al. ∙ 0

• Online Learning with Continuous Variations: Dynamic Regret and Reductions

We study the dynamic regret of a new class of online learning problems, ...
02/19/2019 ∙ by Ching-An Cheng, et al. ∙ 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The fundamental role of stability in determining the generalization ability of learning algorithms in the setting of iid data is now well recognized. Moreover, our knowledge of the connection between stability and generalization is beginning to achieve a fair degree of maturity (see, for instance, [4, 13, 18, 22]). However, the same cannot be said regarding our understanding of the role of stability in online adversarial learning.

Recently, several results have shown connections between learnability of a concept class and stability of its empirical risk minimizer (ERM). Apart from theoretical interest, such insights into stability and learnability, can potentially help in designing more practical algorithms. For example, [13] show that under certain settings, stability is a more general characterization than VC-dimension; good generalization performance can be guaranteed for concept classes with stable ERM, even if its VC-dimension is infinite.

However, most of the existing implications of stability are in the batch or i.i.d. learning setting, with only a few results in the online adversarial setting. Online learning can be modeled as a sequential two-player game between a player (learner) and an adversary where, at each step, the player takes an action from a set and the adversary plays a loss function. The player’s loss is evaluated by applying the adversary’s move to the player’s action and key quantity to control is the

regret of the player in hindsight. Understanding stability in the online learning setting is not only a challenging theoretical problem but is also important from the point of view of applications. For instance, stability allows us to derive guarantees that apply to dependent (non-iid) data [2] and is critical in areas such as privacy [11].

There is a fundamental challenge in extending the connection between stability and learnability from the iid to the online case. In the iid setting, empirical risk minimization (ERM) serves as a canonical learning algorithm [23]. Thus, given any hypothesis class, it is sufficient to just analyze the stability of ERM over the class to characterize its learnability in the batch setting. Unfortunately, no such canonical scheme is known for online learning, making it significantly more involved to forge connections between online learnability and stability. We circumvent this difficulty by studying connections between stability and regret of arbitrary online learning algorithms.

In this paper, we circumvent the above mentioned issue by studying connections between stability and regret of learning algorithms, rather than online learnability of individual concept classes. in a generic sense.To this end, we first define stability for online learning algorithms. Our definition is essentially “leave last one out” stability, also considered by [20]. We also define a uniform version of this stability measure. However, stability alone cannot guarantee bounded regret. For example, an algorithm that always plays one fixed move is clearly the most stable any algorithm can be. But its regret can hardly be bounded. Hence, an additional condition is required that forces the algorithm to make progress. To this end, we introduce a novel measure called forward regret: the excess loss incurred with a look-ahead of one time step (i.e., when player makes its move after seeing the adversary’s move). We show fundamental results relating the three conditions, namely online stability, bounded forward regret and bounded regret. First, assuming stability, bounded regret and bounded forward regret are equivalent. Second, given an algorithm with bounded regret, we can always obtain a stable algorithm with bounded regret and bounded forward regret. We would like to stress that these general results do not rely on convexity assumptions and are not restricted to a particular family of learning algorithms. In contrast, [20] provides equivalence of stability and regret for only certain families of algorithms and concept classes.

We illustrate the usefulness of our general framework by considering several popular online learning algorithms like Follow-The-Leader (FTL) [10, 7], Follow-The-Regularized-Leader (FTRL) [17, 1], Implicit Online Learning (IOL) [12], Regularized Dual Averaging (RDA) [24] and Composite Objective Mirror Descent (COMiD) [8]. We obtain regret bounds for all of them using the fundamental connections between forward regret and stability thereby demonstrating that our framework is not restricted to a particular class of algorithms. Our regret analysis is arguably simpler than existing ones and, in some cases such as IOL, provides tighter guarantees as well.

Finally, we consider “approximate” versions of RDA, IOL, and FTRL algorithms where the optimization problem at each step is solved only up to a small but non-zero additive error. It is important to consider such an analysis because, in practice, the optimization problems arises at each step will not be solved to infinite precision. For each of these three algorithms, we use our general stability based recipe to provide regret bounds for their approximate versions.

We introduce our setup in Section 2. We introduce the online learning framework in Section 3 and review existing work and contrast it to our work in section 4. We introduce our three online learning conditions and show their connections in Section 5. We provide several illustrations of the usefulness of our conditions in analyzing existing online algorithms in Section 6 and finally conclude with Section 9.

2 Bregman Divergences and Strong Convexity

Here we recall the definition of a Bregman divergence [5, 6] which finds use in online learning algorithms. We also relate it to the notion of strong convexity, a key property behind many regret bounds for online learning.

Definition 1.

Let be a strictly convex function on a convex set . Also, let be differentiable on the relative interior of , , assumed to be nonempty. The Bregman divergence generated by the function is given by

 DR(x,y)=R(x)−R(y)−∇R(y)⊤(x−y)

where is the gradient of the function at .

Definition 2.

A convex function is strongly convex with respect to a norm if there exists a constant such that

 Df(u,v)≥α2∥u−v∥2∀u,v∈Rd.

is called the modulus of strong convexity and is also referred to as -strongly convex.

Now, we present a useful lemma characterizing optima of a strongly convex function.

Lemma 3.

Let be an -strongly convex function and let be a convex set. Let be a minimizer of over , i.e., . Then, for any ,

 f(u)≥f(w∗)+α2∥u−w∗∥2.

In particular, the minimizer is unique.

Lower bold case letters (e.g., ,

) denote vectors,

denotes the -th component of . The Euclidean dot product between and is denoted by or . A general norm is denoted by and refers to its dual norm. For most of this paper, we work with arbitrary norms and we use to refer to a specific norm. Unless specified otherwise, , is a compact convex set, and is any loss function. A function is -Lipschitz continuous w.r.t. a norm if .

3 Setup

We now describe the online learning setup that we use in this paper. Let be a fixed set and be a class of real-valued functions over . Now, consider a repeated game of rounds played between a player/learner and an adversary. At every step ,

• The player plays a point from a set .

• The adversary responds with a function .

• The player suffers loss .

The quantity of interest in online learning is the regret which measures how good the player performs compared to the best fixed move in hindsight (i.e. knowing all the moves of the adversary in advance). Regret is defined below in (6). The goal in online learning is to minimize the regret regardless of the function sequence played by the adversary. Online Convex Programming (OCP) [25]

(respectively Online Linear Programming (OLP)) is a special case of the online learning game above where the set

is a compact convex set and is a class of convex (respectively linear) functions defined on .

4 Related Work

For a general introduction to online learning and descriptions of standard algorithms, see [7]. In the iid setting, stability is investigated from various points of view in [4, 13, 18, 22]. There are only a few papers dealing with stability in the online setting. Recently, [20] defined what we call Last Leave-One-Out (LLOO) stability and showed that for FTRL or MD type methods, stable online learning algorithms have bounded regret. In contrast, we distill out the “progress” in terms of forward regret condition and show a much more general connection between stability, regret and forward regret. Unlike [20], our method is extremely generic and does not need to assume any specific algorithmic form or even any specific function class (like convex functions). We also prove that most existing families of online learning algorithms are in fact stable in our sense and using our connections provide simple regret bound analysis for them. Another related work [16]

considers an online algorithm, namely stochastic gradient descent (SGD) algorithm, in the iid setting where each function

is samples points in an iid fashion from some distribution. In this setting, [16] defines a new notion of online stability which is motivated by uniform stability [4]. The paper shows that SGD satisfies the new notion of stability and provides consistency guarantees as well. In contrast, our fundamental results connecting stability with regret hold for any algorithm and for any set of adversary moves , not just those sampled iid from a distribution.

A general class of online learning algorithms are referred to as Follow-The-Leader (FTL) [7] algorithms. At step , this algorithm chooses the element of which minimizes the sum of the functions played by the adversary up to that point:

 wt+1=argminw∈Ct∑i=1fi(w) . (1)

It can be shown that surprisingly simple algorithm achieves regret when the adversary is restricted to playing strongly convex functions [10].

A generalization of FTL is by adding a regularizer which results in the Follow-The-Regularized-Leader (FTRL) algorithm [17, 1]. In this case the update is given by

 wt+1=argminw∈Ct∑i=1ηfi(w)+R(w) (2)

Typically, is a strongly convex regularizer with respect to the appropriate norm and is a tradeoff parameter. Another way of describing FTRL algorithms is using Bregman divergences [17]. In particular, by defining and , we can write FTRL update in an equivalent form:

 wt+1=argminw∈Cηft(w)+Dϕt−1(~w)

where is the corresponding unconstrained minimizer.

Another class of algorithms is the proximal type algorithms also called Mirror Descent(MD) methods [15], that typically tries to find an iterate close to the previous iterate but also minimizes the current loss function and obtains same rates of regret as FTRL. Similar to FTRL, such algorithms also achieves regret for general convex functions and regret for strongly convex functions. It is interesting to note that Zinkevich’s algorithm [25] is just a special case of mirror descent with the Euclidean norm and and is similar to a stochastic gradient descent update [3].

While mirror descent and FTRL look fundamentally different algorithms and were considered to be two different ends of the spectrum for online learning algorithms [21], a recent paper [14] shows equivalence between different mirror descent algorithms and corresponding FTRL counterparts. In particular they show that the FOBOS mirror descent algorithm [9] is conceptually similar to Regularized Dual Averaging (RDA) [24] with minor differences emanating out of usage of proximal strongly convex regularizer and handling of arbitrary nonsmooth regularization like the norm. These difference result in different sparsity properties of the two algorithms.

5 Three conditions for online learning

In this section, we formally define our stability notion as well as introduce our bounded forward regret condition. We show that given stability, bounded regret and bounded forward regret are equivalent. Moreover, any algorithm with bounded regret can be converted into a stable algorithm with bounded regret and forward regret. Finally, we consider several existing OCP algorithms and illustrate that our forward regret and stability conditions can be used to provide a simple recipe for proving regret. For each of the algorithms, our novel analysis simplifies existing analysis significantly and in some cases also tightens the analysis.

We first define the following three quantities for any online learning algorithm:

• Online Stability: Intuitively, an online algorithm is defined to be stable if the consecutive iterates generated by are not too far away from each other. Formally, if is the point selected by at the -th step, then the (cumulative) online stability of is given by

 SA(T)=T∑t=1∥wt−wt+1∥. (3)

Now, if , then we say that is online stable. of stability is closely related to [20] (See Definition 17). Next, we define a stronger definition of stability, which we call Uniform Stability:

 USA(t)=∥wt−wt+1∥. (4)

If , then is defined to be uniformly stable. Clearly, if is uniformly stable then it is (cumulatively) stable as well. In section 6, we show that most of the existing online learning methods are actually uniformly stable. Interestingly, for COMiD (see section 7.4), while proving cumulative stability is relatively straightforward, one can show that uniform stability need not hold in general.

• Forward Regret: Forward regret is the hypothetical regret incurred by if it had access to the next move that the adversary was going to make. Note that forward regret cannot actually be attained by an algorithm since it depends on seeing one step into the future. Formally,

 FRA(T)=T∑t=1[ℓt(wt+1)−ℓt(w∗)], (5)

where . We define to have bounded (or vanishing) forward regret if . Note that if the online algorithms are randomized, we can replace the three quantities with their expected counterparts and all the bounds in the paper still hold.

• Regret: Regret is a standard notion in online learning that measures how good the steps of the algorithm are compared to the best fixed point in hindsight:

 (6)

Here again, if , then is said to have bounded (or vanishing) regret.

These three concepts, besides being important in their own right, are also intimately related. In particular, in the next section we show that given any two of these conditions, the third condition holds.

5.1 Connections between the three conditions

In this section, we show that the three conditions (i.e., bounded stability, bounded forward regret and bounded regret) defined in the previous section are closely related in the sense that given any two of the conditions, the third condition follows directly. For our claim, we first show that assuming stability,

 bounded forward regret⟺bounded regret .

We then prove that bounded regret can be shown to exhibit stability, albeit with worse rates of regret. Our claims are formalized in the following theorems.

Theorem 4.

Assume an online algorithm satisfies the condition of online stability (3) where the function played by the adversary at each step is -Lipschitz. Then, we have,

 RA(T)≤L⋅SA(T)+FRA(T), (7) FRA(T)≤L⋅SA(T)+RA(T).

Therefore, assuming online stability of , bounded forward regret and bounded regret are equivalent conditions.

Proof.

We first assume that has online stability and bounded forward regret. We have

 T∑t=1[ℓt(wt)−ℓt(w∗)] ≤T∑t=1L∥wt−wt+1∥+FR(T)≤L⋅S(T)+FR(T)=o(T),

where the second last inequality follows by Lipschitz continuity of and the last equality holds as both . Hence, has bounded regret. The proof in the reverse direction follows identically. ∎

To complete the picture regarding the connections between the three conditions, we now prove the following theorem.

Theorem 5.

Let be a fixed set of bounded diameter from which a learner selects a point at each step of online learning. Let be the class of -Lipschitz functions from which the adversary plays a function at each step. Also, let have bounded regret. Then, there exists a stable algorithm with bounded regret and forward regret.

Proof.

Intuitively, our proof proceeds by constructing an alternative stable algorithm that averages a batch of loss functions and feeds it into the “unstable” but bounded regret algorithm . We then show bounded regret and forward regret of this new algorithm. Note that our proof strategy is inspired by the proof of Lemma 20 in [22] that shows stability to be a necessary condition for learnability in batch setting.

Formally, given the algorithm , we construct a new algorithm in the following way. We divide the set of points into batches of and repeats the same point in an entire batch. At the end of the batch, it feeds the average of the functions in the batch to to get its next move. It now sticks to this new point for the next time steps before repeating the process all over. In picture,

 A′ sees:A% sees:ℓ1,ℓ2,…,ℓBB⋅g1,ℓB+1,…,ℓ2BB⋅g2,……

Note that the function , being an average of Lipschitz functions, is itself Lipschitz. Denote the elements generated by as and those by as . Note that there are only distinct elements in this sequence: viz. the elements generated by in response to . The stability analysis of now proceeds as follows

 T∑t=1∥∥w′t−w′t+1∥∥=⌊T/B⌋∑t=1∥∥w′(t−1)B+1−w′tB+1∥∥=⌊T/B⌋∑t=1∥wt−wt+1∥≤TBD=o(T),

for the choice in particular. This proves that is stable.

In order to show that has bounded regret, we consider

 T∑t=1(ℓt(w′t)−ℓt(w∗)) ≤B⌊T/B⌋∑t=1(ℓt(w′t)−ℓt(w∗))+L⋅D⋅B=⌊T/B⌋∑i=1iB∑t=(i−1)B+1(ℓt(w′t)−ℓt(w∗))+LDB =B⌊T/B⌋∑i=1(gi(wi)−gi(w∗))+L⋅D⋅B≤B⋅RA(⌊T/B⌋)+L⋅D⋅B,

where as has bounded regret. The last term in the first inequality is an upper bound on the regret due to the last batch of functions (maximally in number). Selecting , we get and hence the above bound is , i.e, has bounded regret. ∎

Thus we show that given any algorithm with bounded regret, we can convert it into another online stable algorithm with bounded regret which also implies bounded forward regret using Theorem 4.

6 Unified analysis of online algorithms

In this section we present examples where existing online learning algorithms can be analyzed through our stability and forward regret conditions and hence lead to regret bounds directly (see Theorem 4). These examples illustrate that the stability and forward regret conditions are critical to regret analysis and in fact provide a fairly straightforward recipe for regret analysis of online learning algorithms. Note that, unlike the general results of section 5, here we will make convexity assumptions on and . One of the major contributions of this paper is that our analysis significantly simplifies as well as tightens up analysis for existing methods like IOL [12].

Before delving into the technical detials, we provide a brief generic sketch of the regret analysis of all the algorithms.

For each of the regret analyses, initially we bound the stability in terms of the learning rate and the Lipschitz coefficient of , . The bounds on stability are generally obtained by exploiting the optimality of at iteration , the lipschitz continuity of and the strong convexity of the regularizer (for the algorithms involving regularization). For the case of IOL, , which makes the stability bounded by .

For FTL, forward regret is non positive by definition of the FTL updates. For all the other algorithms, the bounds on the forward regret follow by again using the optimality of at iteration and comparing the corresponding objective at the final minimizer . This generally results in a telescoping sum, upper bounding the forward regret in terms of the regularizer (or the bregman divergence ) evaluated at the extreme iterates and with all the other terms canceling out by appropriately choosing . In particular, for the case of IOL, the forward regret is bounded by .

Finally bounds on the regret are obtained by using equation (7) while the optimum dependence on are obtained by trading off the step size in the corresponding inequality. Summation over appropriate gives us rates of regret for strongly convex and rates of regret for general convex lipschitz as is common in the literature.

7 Examples

Follow the leader(FTL) is a popular method for OCP when the provided functions are strongly convex. At the -th step FTL chooses to be the element that minimizes the total loss up to that step, i.e.,

 FTL:wt+1=argminw∈Ct∑τ=1ℓτ(w). (8)

The FTL method was analyzed in [7] and [21] for the case when each loss function is at least -strongly convex. Here, using our forward regret and stability conditions, we provide a significantly simpler analysis with similar regret bounds. It should be noted that our analysis is a generalization of the analysis in [7, Section 3.2] from strongly convex functions w.r.t. norm to strongly convex functions w.r.t. arbitrary norm.

Theorem 6.

Let each loss function be -strongly convex and -Lipschitz continuous. Then, the regret incurred by FTL algorithm (see (8)) is bounded by:

 RFTL(T)≤2L2α(1+lnT).
Proof.

Our proof follows the simple recipe of computing stability as well as forward regret bound.

Stability: Using strong convexity, Lemma 3 and the fact is the optimum of (8),

 t∑τ=1ℓτ(wt)≥t∑τ=1ℓτ(wt+1)+tα2∥wt−wt+1∥2. (9)

Similarly, using optimality of for the -th step:

 t−1∑τ=1ℓτ(wt+1)≥t−1∑τ=1ℓτ(wt)+(t−1)α2∥wt−wt+1∥2. (10)

Adding (9) and (10), and by using Lipschitz continuity of we get:

 ℓt(wt)−ℓt(wt+1) ≥(t−1/2)α∥wt−wt+1∥2, ⟹L(t−1/2)α ≥∥wt−wt+1∥. (11)

Using (11), we get:

 T∑t=1∥wt−wt+1∥ ≤T∑t=12L(2t−1)α≤2Lα(1+lnT). (12)

Hence,

 SFTL(T)≤2Lα(1+lnT). (13)

Forward Regret: Using optimality of for -th step:

 T∑t=1ℓt(w∗)≥T∑t=1ℓt(wT+1). (14)

Next using (10) for and (14),

 T∑t=1ℓt(w∗)≥ℓT(wT+1)+T−1∑τ=1ℓτ(wT). (15)

Similarly using (10) with (15) for ,

 T∑t=1ℓt(w∗)≥T∑t=1ℓt(wt+1). (16)

Hence,

 FRFTL(T)=0. (17)

Hence, using Theorem 4, (13), and (17),

 RFTL(T)≤2L2α(1+lnT). (18)

While FTL is an intuitive algorithm, unfortunately, for non-strongly convex functions it need not have bounded regret. However, several recent results show that by adding strongly convex regularization, FTL can be used to obtain bounded regret. Specifically,

 FTRL:wt+1=argminw∈Ct∑τ=1ℓτ(w)+1ηR(w). (19)

where is (generally) a strongly convex function with respect to an appropriate norm. Note that the intuition behind adding a regularization is making the algorithm stable. Our analysis of FTRL explicitly captures this intuition by showing the existence of stability condition, while forward regret follows easily from the forward regret analysis of FTL given above.

Theorem 7.

Let each loss function be -Lipschitz continuous, diameter (as measured in ) of set be , and let be a -strongly convex regularization function. Then, the regret incurred by Follow The Regularized Leader (FTRL) algorithm (see (19)) is bounded by:

 RFTRL(T)≤2L√∥∇R∥∗D√T ,

where .

Proof.

As for FTL, we again prove regret by first proving stability and forward regret.
Stability: Similar to (9) and (10), using strong convexity and optimality conditions for -th and -th step, we get the following relations:

 t∑τ=1ℓτ(wt)+1ηR(wt) ≥t∑τ=1ℓτ(wt+1)+1ηR(wt+1)+12η∥wt−wt+1∥2. (20) t−1∑τ=1ℓτ(wt+1)+1ηR(wt+1) ≥t−1∑τ=1ℓτ(wt)+1ηR(wt)+12η∥wt−wt+1∥2. (21)

Combining (7.2) and (7.2) and by Lipschitz continuity of :

 Lη≥∥wt−wt+1∥. (22)

Hence,

 SFTRL(T)=T∑t=1∥wt−wt+1∥≤LηT. (23)

Choosing satisfies the online stability condition of FTRL.

Forward Regret: Assuming and , FTRL is same as FTL with an additional -th step loss function . Hence using (16), we obtain:

 T∑t=1ℓt(w∗)+1η(R(w∗)−R(w1))≥T∑t=1ℓt(wt+1). (24)

Hence,

 FRFTRL(T)=1η(R(w∗)−R(w1))≤∇R(w∗)⊤(w∗−w1)η≤∥∇R∥∗Dη. (25)

where the first inequality follows using the convexity of and the last one follows using Cauchy Schwartz inequality. Again provides vanishing forward regret for FTRL. Hence, using Theorem 4,

 RFTRL(T)≤∥∇R∥∗Dη+L2ηT≤2L√∥∇R∥∗D√T. (26)

by appropriately choosing to be . ∎

7.3 Regularized Dual Averaging (RDA)

Regularized Dual Averaging [24] is a popular online learning method to handle OCP scenarios where each loss function is regularized by the same regularization function, i.e., functions at each step are of the form , where is a regularization function. RDA computes the iterates using following rule:

 (27)

where , is a strongly convex regularizer that is separately added and is the trade-off parameter. [24] shows that the above update obtains regret for general Lipschitz continuous functions and regret when the regularizer is strongly convex.

Note that RDA is same as FTRL except for linearization of the first part of loss function . Hence, same regret analysis as FTRL should hold. However, analysis by [24] shows that by using special structure of , regret can be bounded even without assuming Lipschitz continuity of the regularization function . Below, we show that using the same recipe of bounding stability and forward regret leads to significantly simpler analysis of RDA as well. Unlike the previous cases, this analysis is slightly more tricky as we cannot assume Lipschitz continuity of to prove stability.

Theorem 8.

Let each loss function be -Lipschitz continuous, be a -strongly convex function and wlog . Now, using at each step, regret of RDA (see (27)) is bounded by .

Proof.

Stability: By strong convexity of and optimality of and for the -th and -th step respectively,

 1tt∑τ=1g⊤τ(wt−wt+1)+r(wt)−r(wt+1) ≥α2∥wt−wt+1∥2, 1t−1t−1∑τ=1g⊤τ(wt+1−wt)+r(wt+1)−r(wt) ≥α2∥wt−wt+1∥2.

 α∥wt−wt+1∥2≤(1tgt−1t(t−1)t−1∑τ=1gτ)⊤(wt−wt+1)≤2Lt∥wt−wt+1∥, (28)

where the second inequality follows from Lipschitz continuity of . After simplification and adding the above expression for all ,

 SRDA(T)≤2Lα(1+lnT). (29)

Note that the above stability analysis is slightly different from that of FTL as we are able to bound the stability by Lipschitz constant of only, rather than .

Forward Regret: When , forward regret follows easily from forward regret of FTL where loss function at each step is . Hence,

 FRRDA(T)≤0. (30)

Hence, using Theorem 4,

 T∑t=1(g⊤t(wt−w∗)+r(wt)−r(w∗))≤2L2α(1+lnT).

The result now follows using convexity of , i.e., . ∎

Next, we bound regret incurred by RDA for general convex, Lipschitz continuous functions.

Theorem 9.

Let each loss function be -Lipschitz continuous and wlog and , . Now, using at each step, regret of RDA (see (27)) is bounded by .

Proof.

Stability: Again, by strong convexity of and optimality of and for the -th and -th step respectively,

 1tt∑τ=1gτ⋅(wt−wt+1)+r(wt)−r(wt+1)+βtt(h(wt)−h(wt+1))≥βt2t∥wt−wt+1∥2,
 −15pt1t−1t−1∑τ=1gτ⋅(wt+1−wt)+r(wt+1)−r(wt)+βt−1t−1(h(wt+1)−h(wt))≥βt−12(t−1)∥wt−wt+1∥2.

Adding the above two equations, using Lipschitz continuity of and upper bound on ,

 (12√t+12√t−1)∥wt−wt+1∥2−2Lt∥wt−wt+1∥−(1√t−1−1√t)D2≤0 (31)

Solving for , we get,

 ∥wt−wt+1∥≤2L+D√t−1. (32)

Hence,

 SRDA(T)≤(2L+D)√T. (33)

Forward Regret: Using optimality of ,

 T∑t=1g⊤tw∗+Tr(w∗)+√Th(w∗)≥T∑t=1gt⋅wT+1+Tr(wT+1)+√Th(wT+1). (34)

Now, using optimality of ,

 T−1∑t=1g⊤twT+1+(T−1)r(wT+1)+√T−1h(wT+1)≥T−1∑t=1gt⋅wT+(T−1)r(wT)+√T−1h(wT). (35)

 T∑t=1g⊤tw∗+Tr(w∗)+√Th(w∗)≥g⊤TwT+1+r(wT+1)+T−1∑t=1g⊤twT+(T−1)r(wT)+√T−1h(wT). (36)

Similarly, combining optimality of in (35) recursively with (36),

 T∑t=1g⊤tw∗+Tr(w∗)+√Th(w∗)≥T∑t=1(g⊤twt+1+r(wt+1)). (37)

Hence, using and ,

 FRRDA(T)≤√TD2. (38)

Hence, using Theorem 4 and convexity of each ,

 RRDA(T)≤(D2+L(2L+D))√T. (39)

7.4 Composite Objective Mirror Descent (COMiD)

Similar to RDA, COMiD [8] is also designed to handle regularized loss functions of the form . Just as RDA is an extension of FTRL to handle composite regularized loss functions, similarly, COMiD is an extension of IOL. Formally,

 COMiD:wt+1=argminw∈Cη(g⊤tw+r(w))+DR(w,wt),

where , is the Bregman divergence with being the generating function. Now, similar to RDA, regret analysis of COMiD follows directly from regret analysis of IOL. However, [8] presents an improved analysis, that can handle non-Lipschitz continuous regularization as well. Here, we show that using our stability/forward-regret based recipe, we can also obtain similar regret bounds with significantly simpler analysis.

Theorem 10.

Let each loss function be of the form , where is a -Lipschitz continuous function and is a regularization function. Let diameter of set be , and let be a Bregman divergence with being the convex generating function. Let . Also, let be a positive function. Then, the regret incurred by the Composite Objective Mirror Descent (COMiD) algorithm is bounded by:

 RCOMiD(T)≤L√2R(w∗)√T.

Furthermore, if each function is -strongly convex w.r.t. , then

 RCOMiD(T)≤2L2α(1+lnT)+αR(w∗).
Proof.

Stability: By optimality of ,

 ηt(gt⋅wt+r(wt))≥DR(wt+1,wt)+ηt(gt⋅wt+1+r(wt+1)), (40)

Adding the above inequality for and using the fact that (by the definition of ),

 T∑t=112Lηt∥wt−wt+1∥2≤T∑t=1∥wt−wt+1∥. (41)

Using Cauchy-Schwarz inequality,

 (T∑t=11√2Lηt√2Lηt∥wt−wt+1∥)2≤T∑t=112Lηt∥wt−wt+1∥2T∑t=12Lηt. (42)

Using (41) and (40),

 SCOMiD(T)=T∑t=1∥wt−wt+1∥≤2LT∑t=1ηt. (43)

Forward Regret: Forward regret follows directly from the forward regret of IOL (53), i.e,

 FRCOMiD =T∑t=1(g⊤t(wt+1−w∗)+r(wt+1)−r(w∗)) (44) (45)

Both the regret bounds follow using convexity of each and setting step sizes as in IOL (see (54), (55)). ∎

7.5 Mirror Descent (MD)

Mirror descent algorithms are a generalization of Zinkevich’s Gradient Infinitesimal Gradient Ascent (GIGA) algorithms [25] where regularization can be drawn from any Bregman distance family. Formally,

 (46)

where is the Bregman divergence generated using . Note that MD update is the same as COMiD with . Hence, our stability analysis as well as regret analysis for general convex functions follows directly. However, for strongly convex functions, our approach does not yield appropriate forward regret directly; primary reason being linearization of the function. Instead, we can obtain regret bound using standard approach (see [25]) and then obtain forward regret bound using Theorem 4.

7.6 Implicit Online Learning (IOL)

Implicit online learning [12] is similar to typical Mirror Descent algorithms but without linearizing the loss function. Specifically at iteration ,

 IOL:wt+1=argminw∈C(DR(w,wt)+ηtℓt(w)), (47)

where is a Bregman’s divergence with being the generating function. It was shown in [12] that using any strongly convex , the above update leads to regret for any Lipschitz continuous convex functions . This paper also shows that if is selected to be squared -norm and each function is strongly-convex and has Lipschitz continuous gradient, then regret can also be achieved. Below, using our recipe of forward regret and stability we reproduce significantly simpler proofs for both as well as regret. Furthermore, our proof requires only strong-convexity and Lipschitz continuity, in contrast to strong-convexity and Lipschitz continuity of the gradient in [12]. Also, our analysis can handle any strongly convex , rather than just the squared -norm regularizer.

Theorem 11.

Let each loss function be -Lipschitz continuous, diameter of set be , and let be a Bregman divergence with being the strongly convex generating function. Also, let be a positive function. Then, the regret incurred by the Implicit Online Learning (IOL) algorithm (see (47)) is bounded by:

 RIOL(T)≤2L√2R(w∗)√T.

Furthermore, if each function is -strongly convex w.r.t i.e. , , then

Proof.

Here again, we follow the recipe of proving stability and forward regret.
Stability: Stability again follows easily by using optimality of and comparing it to . Formally,

 ηtℓt(wt) ≥DR(wt+1,wt)+ηtℓt(wt+1), ηtℓt(wt) ≥12∥wt+1−wt∥2+ηtℓt(wt+1), 2Lηt <