1 Introduction
In this work we study the problem of online learning with nonconvex losses, where in each iteration the learner chooses an action and observes a loss which could potentially be nonconvex. The goal of the learner is to choose a sequence of actions which minimize the cumulative loss suffered over the course of learning. Such a setting has numerous applications in machine learning, especially in adversarial training, robust optimization and training of Generative Adversarial Networks (GANs).
Most of the existing works on online optimization have focused on convex loss functions
(Hazan, 2016). A number of computationally efficient approaches have been proposed for regret minimization in this setting. However, when the losses are nonconvex, minimizing the regret is computationally hard. Recent works on learning with nonconvex losses get over this computational barrier by either working with a restricted class of loss functions such as approximately convex losses (Gao et al., 2018) or by optimizing a computationally tractable notion of regret (Hazan et al., 2017). Consequently, the techniques studied in these papers do not guarantee vanishing regret for general nonconvex losses. Another class of approaches work with general nonconvex losses, but assume access to a sampling oracle (Maillard and Munos, 2010; Krichene et al., 2015) or an offline optimization oracle (Agarwal et al., 2018). Of these, assuming access to an offline optimization oracle is reasonable, given that in practice, simple heuristics such as stochastic gradient descent seem to be able find approximate global optima reasonably fast even for complicated tasks such as training deep neural networks.
In a recent work Agarwal et al. (2018) take this later approach, where they assume access to an offline optimization oracle, and show that the classical Follow the Perturbed Leader (FTPL) algorithm achieves regret for general nonconvex losses which are Lipschitz continuous. In this work we improve upon this result and show that FTPL in fact achieves optimal regret.
2 Problem Setup and Main Results
Let denote the set of all possible moves of the learner. In the online learning framework, on each round , the learner makes a prediction and the nature/adversary simultaneously chooses a loss function and observe each others actions. The goal of the learner is to choose a sequence of actions such that the following notion of regret is small
In this work we assume that has diameter of , which is defined as . We moreover assume that the sequence of loss functions are LLipschitz with respect to norm, i.e.,
Approximate Optimization Oracle.
Our results rely on an offline optimization oracle which takes as input a function and a
dimensional vector
and returns an approximate minimizer of . An optimization oracle is called “approximate optimization oracle” if it returns such thatWe denote such an optimization oracle with .
Ftpl.
Given access to an approximate offline optimization oracle, we study the FTPL algorithm which is described by the following prediction rule (see Algorithm 1).
(1) 
where is a random perturbation such that and
is the exponential distribution with parameter
.Optimistic FTPL (OFTPL).
When the sequence of losses chosen by the adversary are predictable, Rakhlin and Sridharan (2012) show that one can exploit the predictability of losses to obtain better regret bounds. Let be our guess of the loss at the beginning of round , with . To simplify the notation, in the sequel, we suppress the dependence of on . Given , we predict as
(2) 
When our guess is close to we expect OFTPL to have a smaller regret. In Theorem 2 we show that the regret of OFTPL depends only on .
2.1 Main Results
We present our main results for an oblivious adversary who fixes the sequence of losses ahead of the game. In this setting it suffices to work with a single random vector , instead of generating a new random vector in each iteration. Following CesaBianchi and Lugosi (2006), one can show that any algorithm that is guaranteed to work against oblivious adversaries also works for nonoblivious adversaries.
Theorem 1 (NonConvex FTPL).
Let be the diameter of . Suppose the losses encountered by the learner are Lipschitz w.r.t norm. For any fixed , the predictions of Algorithm 1 satisfy the following regret bound
Theorem 2 (NonConvex OFTPL).
Let be the diameter of . Suppose our guess is such that is Lipschitz w.r.t norm, for all . For any fixed , OFTPL with access to an approximate optimization oracle satisfies the following regret bound
The above result shows that for appropriate choice of , FTPL achieves regret.
3 NonConvex FTPL
In this section we present a proof of Theorem 1. The first step in the proof involves relating the expected regret to the stability of prediction, which is a standard step in the analysis of many online learning algorithms.
Lemma 3.
The regret of Algorithm 1 can be upper bounded as
(3) 
In the rest of the proof we focus on bounding . We use a similar proof technique as in Agarwal et al. (2018). We first show that the minimizers of FTPL satisfy certain monotonicity properties.
Lemma 4.
Let be the prediction of FTPL in iteration , with random perturbation . Let denote the standard basis vector and denote the coordinate of . Then the following monotonicity property holds for any
Proof.
Let and . From the approximate optimality of we have
where the second inequality follows from the approximate optimality of . This shows that . ∎
Lemma 5.
Let denote the standard basis vector and denote the coordinate of . Suppose . For , we have
Proof.
Let . From the approximate optimality of , we have
where the second inequality follows from the Lipschitz property of and the last inequality follows from our assumption on . Next, from the optimality of , we have
where the last inequality follows from the optimality of . Combining the above two equations, we get
A similar argument shows that
Finally, using the monotonicity property in Lemma 4, we get
Combining the above three inequalities gives us the required result. ∎
Proof of Theorem 1.
We now proceed to the proof of Theorem 1. First note that can be written as
(4) 
To bound we derive an upper bound for . For any , define as
where is the coordinate of . Let and . Then . Define event as
Consider the following
where the first inequality follows from the fact that the diameter of the domain is and in the last inequality is defined as We now use the monotonicity properties proved in Lemmas 4, 5 to further lower bound .
where the first inequality follows from Lemmas 4, 5, the second inequality follows from the definition of , and the last inequality uses the the fact that . Rearranging the terms in the last inequality gives us
Since the above bound holds for any , we get the following bound on the unconditioned expectation
Plugging this in Equation (4) gives us the following bound on stability of predictions of FTPL
Plugging the above bound in Equation (3) gives us the required bound on regret.
4 NonConvex OFTPL
In this section we present a proof of Theorem 2. We first relate the expected regret of OFTPL to the stability of its prediction. Unlike Lemma 3, the upper bound we obtain for OFTPL depends on the Lipschitz constant of .
Lemma 6.
Let be any minimizer of . The regret of OFTPL can be upper bounded as
(5) 
References
 Hazan (2016) Elad Hazan. Introduction to online convex optimization. Foundations and Trends® in Optimization, 2(34):157–325, 2016.

Gao et al. (2018)
Xiand Gao, Xiaobo Li, and Shuzhong Zhang.
Online learning with nonconvex losses and nonstationary regret.
In
International Conference on Artificial Intelligence and Statistics
, pages 235–243, 2018.  Hazan et al. (2017) Elad Hazan, Karan Singh, and Cyril Zhang. Efficient regret minimization in nonconvex games. arXiv preprint arXiv:1708.00075, 2017.
 Maillard and Munos (2010) OdalricAmbrym Maillard and Rémi Munos. Online learning in adversarial lipschitz environments. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 305–320. Springer, 2010.
 Krichene et al. (2015) Walid Krichene, Maximilian Balandat, Claire Tomlin, and Alexandre Bayen. The hedge algorithm on a continuum. In International Conference on Machine Learning, pages 824–832, 2015.
 Agarwal et al. (2018) Naman Agarwal, Alon Gonen, and Elad Hazan. Learning in nonconvex games with an optimization oracle. arXiv preprint arXiv:1810.07362, 2018.
 Rakhlin and Sridharan (2012) Alexander Rakhlin and Karthik Sridharan. Online learning with predictable sequences. arXiv preprint arXiv:1208.3728, 2012.
 CesaBianchi and Lugosi (2006) Nicolo CesaBianchi and Gabor Lugosi. Prediction, learning, and games. Cambridge university press, 2006.
Appendix A Proof of Lemma 3
For any we have
We now use induction to show that .
Base Case ().
Since is an approximate minimizer of , we have
where the last inequality holds for any . This shows that .
Induction Step.
Suppose the claim holds for all . We now show that it also holds for .
where the first inequality follows since the claim holds for any , and the last inequality follows from the approximate optimality of .
Using this result, we get the following upper bound on the expected regret of FTPL
The proof of the Lemma now follows from the following property of exponential distribution
Appendix B Proof of Lemma 6
The proof uses similar arguments as in the proof of Rakhlin and Sridharan [2012] for Optimistic FTRL. Let . Then for any we have
We use induction to show that the following holds for any
Base Case ().
First note that . Since is a minimizer of , we have
This shows that .
Induction Step.
Suppose the claim holds for all . We now show that it also holds for .
where the first inequality follows since the claim holds for any , the second inequality follows from the approximate optimality of and the last inequality follows from the optimality of .
This gives the following upper bound on the regret of OFTPL