Online Non-Convex Learning: Following the Perturbed Leader is Optimal

We study the problem of online learning with non-convex losses, where the learner has access to an offline optimization oracle. We show that the classical Follow the Perturbed Leader (FTPL) algorithm achieves optimal regret rate of O(T^-1/2) in this setting. This improves upon the previous best-known regret rate of O(T^-1/3) for FTPL. We further show that an optimistic variant of FTPL achieves better regret bounds when the sequence of losses encountered by the learner is `predictable'.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

10/17/2018

Learning in Non-convex Games with an Optimization Oracle

We consider adversarial online learning in a non-convex setting under th...
06/13/2020

Follow the Perturbed Leader: Optimism and Fast Parallel Algorithms for Smooth Minimax Games

We consider the problem of online learning and its application to solvin...
06/18/2022

Optimal Dynamic Regret in LQR Control

We consider the problem of nonstochastic control with a sequence of quad...
12/11/2019

Near-optimal Oracle-efficient Algorithms for Stationary and Non-Stationary Stochastic Linear Bandits

We investigate the design of two algorithms that enjoy not only computat...
01/06/2017

Follow the Compressed Leader: Faster Online Learning of Eigenvectors and Faster MMWU

The online problem of computing the top eigenvector is fundamental to ma...
10/16/2020

Online non-convex optimization with imperfect feedback

We consider the problem of online learning with non-convex losses. In te...
12/03/2020

Online learning with dynamics: A minimax perspective

We study the problem of online learning with dynamics, where a learner i...

1 Introduction

In this work we study the problem of online learning with non-convex losses, where in each iteration the learner chooses an action and observes a loss which could potentially be non-convex. The goal of the learner is to choose a sequence of actions which minimize the cumulative loss suffered over the course of learning. Such a setting has numerous applications in machine learning, especially in adversarial training, robust optimization and training of Generative Adversarial Networks (GANs).

Most of the existing works on online optimization have focused on convex loss functions

(Hazan, 2016). A number of computationally efficient approaches have been proposed for regret minimization in this setting. However, when the losses are non-convex, minimizing the regret is computationally hard. Recent works on learning with non-convex losses get over this computational barrier by either working with a restricted class of loss functions such as approximately convex losses (Gao et al., 2018) or by optimizing a computationally tractable notion of regret (Hazan et al., 2017). Consequently, the techniques studied in these papers do not guarantee vanishing regret for general non-convex losses. Another class of approaches work with general non-convex losses, but assume access to a sampling oracle (Maillard and Munos, 2010; Krichene et al., 2015) or an offline optimization oracle (Agarwal et al., 2018)

. Of these, assuming access to an offline optimization oracle is reasonable, given that in practice, simple heuristics such as stochastic gradient descent seem to be able find approximate global optima reasonably fast even for complicated tasks such as training deep neural networks.

In a recent work Agarwal et al. (2018) take this later approach, where they assume access to an offline optimization oracle, and show that the classical Follow the Perturbed Leader (FTPL) algorithm achieves regret for general non-convex losses which are Lipschitz continuous. In this work we improve upon this result and show that FTPL in fact achieves optimal regret.

2 Problem Setup and Main Results

Let denote the set of all possible moves of the learner. In the online learning framework, on each round , the learner makes a prediction and the nature/adversary simultaneously chooses a loss function and observe each others actions. The goal of the learner is to choose a sequence of actions such that the following notion of regret is small

In this work we assume that has diameter of , which is defined as . We moreover assume that the sequence of loss functions are L-Lipschitz with respect to norm, i.e.,

Approximate Optimization Oracle.

Our results rely on an offline optimization oracle which takes as input a function and a

-dimensional vector

and returns an approximate minimizer of . An optimization oracle is called “-approximate optimization oracle” if it returns such that

We denote such an optimization oracle with .

Ftpl.

Given access to an -approximate offline optimization oracle, we study the FTPL algorithm which is described by the following prediction rule (see Algorithm 1).

(1)

where is a random perturbation such that and

is the exponential distribution with parameter

.

1:Input: Parameter of exponential distribution , approximate optimization oracle
2:for  do
3:     Generate random vector such that
4:     Predict as
5:     Observe loss function
6:end for
Algorithm 1 Follow the Perturbed Leader (FTPL)

Optimistic FTPL (OFTPL).

When the sequence of losses chosen by the adversary are predictable, Rakhlin and Sridharan (2012) show that one can exploit the predictability of losses to obtain better regret bounds. Let be our guess of the loss at the beginning of round , with . To simplify the notation, in the sequel, we suppress the dependence of on . Given , we predict as

(2)

When our guess is close to we expect OFTPL to have a smaller regret. In Theorem 2 we show that the regret of OFTPL depends only on .

2.1 Main Results

We present our main results for an oblivious adversary who fixes the sequence of losses ahead of the game. In this setting it suffices to work with a single random vector , instead of generating a new random vector in each iteration. Following Cesa-Bianchi and Lugosi (2006), one can show that any algorithm that is guaranteed to work against oblivious adversaries also works for non-oblivious adversaries.

Theorem 1 (Non-Convex FTPL).

Let be the diameter of . Suppose the losses encountered by the learner are -Lipschitz w.r.t norm. For any fixed , the predictions of Algorithm 1 satisfy the following regret bound

Theorem 2 (Non-Convex OFTPL).

Let be the diameter of . Suppose our guess is such that is -Lipschitz w.r.t norm, for all . For any fixed , OFTPL with access to an -approximate optimization oracle satisfies the following regret bound

The above result shows that for appropriate choice of , FTPL achieves regret.

3 Non-Convex FTPL

In this section we present a proof of Theorem 1. The first step in the proof involves relating the expected regret to the stability of prediction, which is a standard step in the analysis of many online learning algorithms.

Lemma 3.

The regret of Algorithm 1 can be upper bounded as

(3)

In the rest of the proof we focus on bounding . We use a similar proof technique as in Agarwal et al. (2018). We first show that the minimizers of FTPL satisfy certain monotonicity properties.

Lemma 4.

Let be the prediction of FTPL in iteration , with random perturbation . Let denote the standard basis vector and denote the coordinate of . Then the following monotonicity property holds for any

Proof.

Let and . From the approximate optimality of we have

where the second inequality follows from the approximate optimality of . This shows that . ∎

Lemma 5.

Let denote the standard basis vector and denote the coordinate of . Suppose . For , we have

Proof.

Let . From the approximate optimality of , we have

where the second inequality follows from the Lipschitz property of and the last inequality follows from our assumption on . Next, from the optimality of , we have

where the last inequality follows from the optimality of . Combining the above two equations, we get

A similar argument shows that

Finally, using the monotonicity property in Lemma 4, we get

Combining the above three inequalities gives us the required result. ∎

Proof of Theorem 1.

We now proceed to the proof of Theorem 1. First note that can be written as

(4)

To bound we derive an upper bound for . For any , define as

where is the coordinate of . Let and . Then . Define event as

Consider the following

where the first inequality follows from the fact that the diameter of the domain is and in the last inequality is defined as We now use the monotonicity properties proved in Lemmas 45 to further lower bound .

where the first inequality follows from Lemmas 45, the second inequality follows from the definition of , and the last inequality uses the the fact that . Rearranging the terms in the last inequality gives us

Since the above bound holds for any , we get the following bound on the unconditioned expectation

Plugging this in Equation (4) gives us the following bound on stability of predictions of FTPL

Plugging the above bound in Equation (3) gives us the required bound on regret.

4 Non-Convex OFTPL

In this section we present a proof of Theorem 2. We first relate the expected regret of OFTPL to the stability of its prediction. Unlike Lemma 3, the upper bound we obtain for OFTPL depends on the Lipschitz constant of .

Lemma 6.

Let be any minimizer of . The regret of OFTPL can be upper bounded as

(5)

The rest of the proof of Theorem 2 involves bounding and uses the same arguments as the proof of Theorem 1.

References

Appendix A Proof of Lemma 3

For any we have

We now use induction to show that .

Base Case ().

Since is an approximate minimizer of , we have

where the last inequality holds for any . This shows that .

Induction Step.

Suppose the claim holds for all . We now show that it also holds for .

where the first inequality follows since the claim holds for any , and the last inequality follows from the approximate optimality of .

Using this result, we get the following upper bound on the expected regret of FTPL

The proof of the Lemma now follows from the following property of exponential distribution

Appendix B Proof of Lemma 6

The proof uses similar arguments as in the proof of Rakhlin and Sridharan [2012] for Optimistic FTRL. Let . Then for any we have

We use induction to show that the following holds for any

Base Case ().

First note that . Since is a minimizer of , we have

This shows that .

Induction Step.

Suppose the claim holds for all . We now show that it also holds for .

where the first inequality follows since the claim holds for any , the second inequality follows from the approximate optimality of and the last inequality follows from the optimality of .

This gives the following upper bound on the regret of OFTPL