# Online Non-Convex Learning: Following the Perturbed Leader is Optimal

We study the problem of online learning with non-convex losses, where the learner has access to an offline optimization oracle. We show that the classical Follow the Perturbed Leader (FTPL) algorithm achieves optimal regret rate of O(T^-1/2) in this setting. This improves upon the previous best-known regret rate of O(T^-1/3) for FTPL. We further show that an optimistic variant of FTPL achieves better regret bounds when the sequence of losses encountered by the learner is `predictable'.

• 9 publications
• 48 publications
10/17/2018

### Learning in Non-convex Games with an Optimization Oracle

We consider adversarial online learning in a non-convex setting under th...
06/13/2020

### Follow the Perturbed Leader: Optimism and Fast Parallel Algorithms for Smooth Minimax Games

We consider the problem of online learning and its application to solvin...
06/18/2022

### Optimal Dynamic Regret in LQR Control

We consider the problem of nonstochastic control with a sequence of quad...
12/11/2019

### Near-optimal Oracle-efficient Algorithms for Stationary and Non-Stationary Stochastic Linear Bandits

We investigate the design of two algorithms that enjoy not only computat...
01/06/2017

### Follow the Compressed Leader: Faster Online Learning of Eigenvectors and Faster MMWU

The online problem of computing the top eigenvector is fundamental to ma...
10/16/2020

### Online non-convex optimization with imperfect feedback

We consider the problem of online learning with non-convex losses. In te...
12/03/2020

### Online learning with dynamics: A minimax perspective

We study the problem of online learning with dynamics, where a learner i...

## 1 Introduction

In this work we study the problem of online learning with non-convex losses, where in each iteration the learner chooses an action and observes a loss which could potentially be non-convex. The goal of the learner is to choose a sequence of actions which minimize the cumulative loss suffered over the course of learning. Such a setting has numerous applications in machine learning, especially in adversarial training, robust optimization and training of Generative Adversarial Networks (GANs).

Most of the existing works on online optimization have focused on convex loss functions

(Hazan, 2016). A number of computationally efficient approaches have been proposed for regret minimization in this setting. However, when the losses are non-convex, minimizing the regret is computationally hard. Recent works on learning with non-convex losses get over this computational barrier by either working with a restricted class of loss functions such as approximately convex losses (Gao et al., 2018) or by optimizing a computationally tractable notion of regret (Hazan et al., 2017). Consequently, the techniques studied in these papers do not guarantee vanishing regret for general non-convex losses. Another class of approaches work with general non-convex losses, but assume access to a sampling oracle (Maillard and Munos, 2010; Krichene et al., 2015) or an offline optimization oracle (Agarwal et al., 2018)

. Of these, assuming access to an offline optimization oracle is reasonable, given that in practice, simple heuristics such as stochastic gradient descent seem to be able find approximate global optima reasonably fast even for complicated tasks such as training deep neural networks.

In a recent work Agarwal et al. (2018) take this later approach, where they assume access to an offline optimization oracle, and show that the classical Follow the Perturbed Leader (FTPL) algorithm achieves regret for general non-convex losses which are Lipschitz continuous. In this work we improve upon this result and show that FTPL in fact achieves optimal regret.

## 2 Problem Setup and Main Results

Let denote the set of all possible moves of the learner. In the online learning framework, on each round , the learner makes a prediction and the nature/adversary simultaneously chooses a loss function and observe each others actions. The goal of the learner is to choose a sequence of actions such that the following notion of regret is small

 1TT∑t=1ft(xt)−1Tinfx∈XT∑t=1ft(x).

In this work we assume that has diameter of , which is defined as . We moreover assume that the sequence of loss functions are L-Lipschitz with respect to norm, i.e.,

#### Approximate Optimization Oracle.

Our results rely on an offline optimization oracle which takes as input a function and a

-dimensional vector

and returns an approximate minimizer of . An optimization oracle is called “-approximate optimization oracle” if it returns such that

 f(x∗)−⟨σ,x∗⟩≤infx∈Xf(x)−⟨σ,x⟩+α.

We denote such an optimization oracle with .

#### Ftpl.

Given access to an -approximate offline optimization oracle, we study the FTPL algorithm which is described by the following prediction rule (see Algorithm 1).

 xt=Oα(t−1∑i=1fi−σt), (1)

where is a random perturbation such that and

is the exponential distribution with parameter

.

#### Optimistic FTPL (OFTPL).

When the sequence of losses chosen by the adversary are predictable, Rakhlin and Sridharan (2012) show that one can exploit the predictability of losses to obtain better regret bounds. Let be our guess of the loss at the beginning of round , with . To simplify the notation, in the sequel, we suppress the dependence of on . Given , we predict as

 xt=Oα(t−1∑i=1fi+gt−σt) (2)

When our guess is close to we expect OFTPL to have a smaller regret. In Theorem 2 we show that the regret of OFTPL depends only on .

### 2.1 Main Results

We present our main results for an oblivious adversary who fixes the sequence of losses ahead of the game. In this setting it suffices to work with a single random vector , instead of generating a new random vector in each iteration. Following Cesa-Bianchi and Lugosi (2006), one can show that any algorithm that is guaranteed to work against oblivious adversaries also works for non-oblivious adversaries.

###### Theorem 1 (Non-Convex FTPL).

Let be the diameter of . Suppose the losses encountered by the learner are -Lipschitz w.r.t norm. For any fixed , the predictions of Algorithm 1 satisfy the following regret bound

 Eσ[T∑t=1ft(xt)−infx∈XT∑t=1ft(x)]≤O(ηd2DL2T+dDlogdη+αT).
###### Theorem 2 (Non-Convex OFTPL).

Let be the diameter of . Suppose our guess is such that is -Lipschitz w.r.t norm, for all . For any fixed , OFTPL with access to an -approximate optimization oracle satisfies the following regret bound

 Eσ[T∑t=1ft(xt)−infx∈XT∑t=1ft(x)]≤O(ηd2DT∑t=1L2t+dDlogdη+αT).

The above result shows that for appropriate choice of , FTPL achieves regret.

## 3 Non-Convex FTPL

In this section we present a proof of Theorem 1. The first step in the proof involves relating the expected regret to the stability of prediction, which is a standard step in the analysis of many online learning algorithms.

###### Lemma 3.

The regret of Algorithm 1 can be upper bounded as

 E[T∑t=1ft(xt)−infx∈XT∑t=1ft(x)]≤LT∑t=1E[∥xt−xt+1∥1]+(1+logd)dDη+αT. (3)

In the rest of the proof we focus on bounding . We use a similar proof technique as in Agarwal et al. (2018). We first show that the minimizers of FTPL satisfy certain monotonicity properties.

###### Lemma 4.

Let be the prediction of FTPL in iteration , with random perturbation . Let denote the standard basis vector and denote the coordinate of . Then the following monotonicity property holds for any

 xt,i(σ+cei)≥xt,i(σ)−2αc.
###### Proof.

Let and . From the approximate optimality of we have

 f1:t−1(xt(σ))−⟨σ,xt(σ)⟩ ≤f1:t−1(xt(σ′))−⟨σ,xt(σ′)⟩+α =f1:t−1(xt(σ′))−⟨σ′,xt(σ′)⟩+cxt,i(σ′)+α ≤f1:t−1(xt(σ))−⟨σ′,xt(σ)⟩+cxt,i(σ′)+2α =f1:t−1(xt(σ))−⟨σ,xt(σ)⟩+c(xt,i(σ′)−xt,i(σ))+2α,

where the second inequality follows from the approximate optimality of . This shows that . ∎

###### Lemma 5.

Let denote the standard basis vector and denote the coordinate of . Suppose . For , we have

 min(xt,i(σ′),xt+1,i(σ′))≥max(xt,i(σ),xt+1,i(σ))−110|xt,i(σ)−xt+1,i(σ)|−3α100Ld.
###### Proof.

Let . From the approximate optimality of , we have

 f1:t−1(xt(σ))−⟨σ,xt(σ)⟩+ft(xt(σ)) ≤f1:t−1(xt+1(σ))−⟨σ,xt+1(σ)⟩+ft(xt(σ))+α ≤f1:t−1(xt+1(σ))−⟨σ,xt+1(σ)⟩+ft(xt+1(σ))+L∥xt(σ)−xt+1(σ)∥1+α ≤f1:t−1(xt+1(σ))−⟨σ,xt+1(σ)⟩+ft(xt+1(σ))+10Ld|xt,i(σ)−xt+1,i(σ)|+α,

where the second inequality follows from the Lipschitz property of and the last inequality follows from our assumption on . Next, from the optimality of , we have

 f1:t−1(xt(σ))−⟨σ,xt(σ)⟩+ft(xt(σ)) =f1:t−1(xt(σ))−⟨σ′,xt(σ)⟩+ft(xt(σ))+⟨100Ldei,xt(σ)⟩ ≥f1:t−1(xt+1(σ′))−⟨σ′,xt+1(σ′)⟩+ft(xt+1(σ′))+100Ldxt,i(σ)−α =f1:t−1(xt+1(σ′))−⟨σ,xt+1(σ′)⟩+ft(xt+1(σ′))+100Ld(xt,i(σ)−xt+1,i(σ′))−α ≥f1:t−1(xt+1(σ))−⟨σ,xt+1(σ)⟩+ft(xt+1(σ))+100Ld(xt,i(σ)−xt+1,i(σ′))−2α,

where the last inequality follows from the optimality of . Combining the above two equations, we get

 xt+1,i(σ′)−xt,i(σ)≥−110|xt,i(σ)−xt+1,i(σ)|−3α100Ld.

A similar argument shows that

 xt,i(σ′)−xt+1,i(σ)≥−110|xt,i(σ)−xt+1,i(σ)|−3α100Ld.

Finally, using the monotonicity property in Lemma 4, we get

 xt+1,i(σ′)−xt+1,i(σ)≥−3α100Ld,xt,i(σ′)−xt,i(σ)≥−3α100Ld.

Combining the above three inequalities gives us the required result. ∎

#### Proof of Theorem 1.

We now proceed to the proof of Theorem 1. First note that can be written as

 E[∥xt(σ)−xt+1(σ)∥1]=d∑i=1E[|xt,i(σ)−xt+1,i(σ)|]. (4)

To bound we derive an upper bound for . For any , define as

 E−i[|xt,i(σ)−xt+1,i(σ)|]\coloneqqE[|xt,i(σ)−xt+1,i(σ)|∣∣{σj}j≠i],

where is the coordinate of . Let and . Then . Define event as

 E={σ:∥xt(σ)−xt+1(σ)∥1≤10d⋅|xt,i(σ)−xt+1,i(σ)|}.

Consider the following

 E−i[xmin,i(σ)]=P(σi<100Ld)E−i[xmin,i(σ)|σi<100Ld]+P(σi≥100Ld)E−i[xmin,i(σ)|σi≥100Ld]≥(1−exp(−100ηLd))(E−i[xmax,i(σ)]−D)+exp(−100ηLd)E−i[xmin,i(σ+100Ldei)]≥(1−exp(−100ηLd))(E−i[xmax,i(σ)]−D)+exp(−100ηLd)P−i(E)E−i[xmin,i(σ+100Ldei)|E]+exp(−100ηLd)P−i(Ec)E−i[xmin,i(σ+100Ldei)|Ec],

where the first inequality follows from the fact that the diameter of the domain is and in the last inequality is defined as We now use the monotonicity properties proved in Lemmas 45 to further lower bound .

 E−i[xmin,i(σ)]≥(1−exp(−100ηLd))(E−i[xmax,i(σ)]−D)+exp(−100ηLd)P−i(E)E−i[xmax,i(σ)−110|xt,i(σ)−xt+1,i(σ)|−3α100Ld∣∣E]+exp(−100ηLd)P−i(Ec)E−i[xmin,i(σ)−2α100Ld|Ec]≥(1−exp(−100ηLd))(E−i[xmax,i(σ)]−D)+exp(−100ηLd)P−i(E)E−i[xmax,i(σ)−110|xt,i(σ)−xt+1,i(σ)|−3α100Ld∣∣E]+exp(−100ηLd)P−i(Ec)E−i[xmax,i(σ)−110d∥xt(σ)−xt+1(σ)∥1−2α100Ld∣∣Ec]≥(1−exp(−100ηLd))(E−i[xmax,i(σ)]−D)+exp(−100ηLd)E−i[xmax,i(σ)−3α100Ld]−exp(−100ηLd)E−i[110|xt,i(σ)−xt+1,i(σ)|+110d∥xt(σ)−xt+1(σ)∥1]≥E−i[xmax,i(σi)]−100ηLdD−3α100Ld−E−i[110|xt,i(σ)−xt+1,i(σ)|+110d∥xt(σ)−xt+1(σ)∥1],

where the first inequality follows from Lemmas 45, the second inequality follows from the definition of , and the last inequality uses the the fact that . Rearranging the terms in the last inequality gives us

 E−i[|xt,i(σ)−xt+1,i(σ)|]≤10009ηLdD+19dE−i[∥xt(σ)−xt+1(σ)∥1]+α30Ld.

Since the above bound holds for any , we get the following bound on the unconditioned expectation

 E[|xt,i(σ)−xt+1,i(σ)|]≤10009ηLdD+19dE[∥xt(σ)−xt+1(σ)∥1]+α30Ld.

Plugging this in Equation (4) gives us the following bound on stability of predictions of FTPL

 E[∥xt(σ)−xt+1(σ)∥1]≤125ηLd2D+α20L.

Plugging the above bound in Equation (3) gives us the required bound on regret.

## 4 Non-Convex OFTPL

In this section we present a proof of Theorem 2. We first relate the expected regret of OFTPL to the stability of its prediction. Unlike Lemma 3, the upper bound we obtain for OFTPL depends on the Lipschitz constant of .

###### Lemma 6.

Let be any minimizer of . The regret of OFTPL can be upper bounded as

 E[T∑t=1ft(xt)−infx∈XT∑t=1ft(x)]≤T∑t=1LtE[∥xt−¯xt+1∥1]+(1+logd)dDη+αT. (5)

The rest of the proof of Theorem 2 involves bounding and uses the same arguments as the proof of Theorem 1.

## Appendix A Proof of Lemma 3

For any we have

 T∑t=1[ft(xt)−ft(x∗)] =T∑t=1[ft(xt)−ft(xt+1)]+T∑t=1[ft(xt+1)−ft(x∗)] ≤T∑t=1L∥xt−xt+1∥1+T∑t=1[ft(xt+1)−ft(x∗)].

We now use induction to show that .

#### Base Case (T=1).

Since is an approximate minimizer of , we have

 f1(x2)−⟨σ,x2⟩≤minx∈Xf1(x)−⟨σ,x⟩+α≤f1(x∗)−⟨σ,x∗⟩+α,

where the last inequality holds for any . This shows that .

#### Induction Step.

Suppose the claim holds for all . We now show that it also holds for .

 T0∑t=1ft(xt+1) =[T0∑t=1ft(xT0+1)−⟨σ,xT0+1⟩]+⟨σ,x2⟩+α(T0−1) ≤T0∑t=1ft(x∗)+⟨σ,x2−x∗⟩+αT0,∀x∗∈X,

where the first inequality follows since the claim holds for any , and the last inequality follows from the approximate optimality of .

Using this result, we get the following upper bound on the expected regret of FTPL

 E[T∑t=1ft(xt)−infx∈XT∑t=1ft(x)]≤LT∑t=1E[∥xt−xt+1∥1]+dDE[∥σ∥∞]+αT.

The proof of the Lemma now follows from the following property of exponential distribution

 E[∥σ∥∞]≤1+logdη.

## Appendix B Proof of Lemma 6

The proof uses similar arguments as in the proof of Rakhlin and Sridharan [2012] for Optimistic FTRL. Let . Then for any we have

 T∑t=1[ft(xt)−ft(x∗)] =T∑t=1[Δt(xt)−Δt(¯xt+1)]+T∑t=1[gt(xt)−gt(¯xt+1)]+T∑t=1[ft(¯xt+1)−ft(x∗)] ≤T∑t=1Lt∥xt−¯xt+1∥1+T∑t=1[gt(xt)−gt(¯xt+1)]+T∑t=1[ft(¯xt+1)−ft(x∗)].

We use induction to show that the following holds for any

#### Base Case (T=1).

First note that . Since is a minimizer of , we have

 f1(¯x2)−⟨σ,¯x2⟩≤f1(x∗)−⟨σ,x∗⟩,∀x∗∈X.

This shows that .

#### Induction Step.

Suppose the claim holds for all . We now show that it also holds for .

 T0∑t=1[gt(xt)−gt(¯xt+1)]+T0∑t=1ft(¯xt+1) ≤[T0−1∑t=1ft(xT0)+⟨σ,¯x2−xT0⟩+α(T0−2)]+[gT0(xT0)−gT0(¯xT0+1)+fT0(¯xT0+1)] =[T0−1∑t=1ft(xT0)+gT0(xT0)−⟨σ,xT0⟩]+[⟨σ,¯x2⟩−gT0(¯xT0+1)+fT0(¯xT0+1)]+α(T0−2) ≤[T0−1∑t=1ft(¯xT0+1)+gT0(¯xT0+1)−⟨σ,¯xT0+1⟩]+[⟨σ,¯x2⟩−gT0(¯xT0+1)+fT0(¯xT0+1)]+α(T0−1) =[T0∑t=1ft(¯xT0+1)−⟨σ,¯xT0+1⟩]+⟨σ,¯x2⟩+α(T0−1) ≤T0∑t=1ft(x∗)+⟨σ,¯x2−x∗⟩+α(T0−1),

where the first inequality follows since the claim holds for any , the second inequality follows from the approximate optimality of and the last inequality follows from the optimality of .

This gives the following upper bound on the regret of OFTPL

 T∑t=1ft(xt)−infx∈XT∑i=1ft(x)≤T∑t=1Lt∥xt−¯xt+1∥1+(1+logd)dDη+αT.