1 Introduction
We consider the problem of bandit convex optimization with twopoint feedback [1]. This problem can be defined as a repeated game between a learner and an adversary as follows: At each round , the adversary picks a convex function on , which is not revealed to the learner. The learner then chooses a point from some known and closed convex set , and suffers a loss . As feedback, the learner may choose two points and receive^{1}^{1}1This is slightly different than the model of [1], where the learner only chooses and the loss is . However, our results and analysis can be easily translated to their setting, and the model we discuss translates more directly to the zeroorder stochastic optimization considered later. . The learner’s goal is to minimize average regret, defined as
In this note, we focus on obtaining bounds on the expected average regret (with respect to the learner’s randomness).
A closelyrelated and easier setting is zeroorder stochastic convex optimization. In this setting, our goal is to approximately solve , given limited access to where are i.i.d. instantiations. Specifically, we assume that each is not directly observed, but rather can be queried at two points. This models situations where computing gradients directly is complicated or infeasible. It is wellknown [3] that given an algorithm with expected average regret in the bandit optimization setting above, if we feed it with the functions , then the average of the points generated satisfies the following bound on the expected optimization error:
Thus, an algorithm for bandit optimization can be converted to an algorithm for zeroorder stochastic optimization with similar guarantees.
The bandit optimization setting with twopoint feedback was proposed and studied in [1]. Independently, [8] and considered twopoint methods for stochastic optimization. Both papers are based on randomized gradient estimates which are then fed into standard firstorder algorithms (e.g. gradient descent, or more generally mirror descent). However, the regret/error guarantees in both papers were suboptimal in terms of the dependence on the dimension. Recently, [4] considered a similar approach for the stochastic optimization setting, attaining an optimal error guarantee when is a smooth function (differential and with Lipschitzcontinuous gradients). Related results in the smooth case were also obtained by [6]. However, to tackle the general case, where may be nonsmooth, [4] resorted to a nontrivial smoothing scheme and a significantly more involved analysis. The resulting bounds have additional factors (logarithmic in the dimension) compared to the guarantees in the smooth case. Moreover, an analysis is only provided for Euclidean problems (where the domain and Lipschitz parameter of scale with the norm).
In this note, we present and analyze a simple algorithm with the following properties:

For Euclidean problems, it is optimal up to constants for both smooth and nonsmooth functions. This closes the gap between the smooth and nonsmooth Euclidean problems in this setting.

The algorithm and analysis are readily applicable to nonEuclidean problems. We give an example for the norm, with the resulting bound optimal up to a factor.

The algorithm and analysis are simpler than those proposed in [4]
. They apply equally to the bandit and zeroorder optimization setting, and can be readily extended using standard techniques (e.g. to stronglyconvex functions, regret/error bounds holding with highprobability rather than just in expectation, and improved bounds if allowed
observations per round instead of just two).
Like previous algorithms, our algorithm is based on a random gradient estimator, which given a function and point , queries at two random locations close to
, and computes a random vector whose expectation is a gradient of a smoothed version of
. The papers [8, 4, 6] essentially use the estimator which queries at and (where is a random unit vector and is a small parameter), and returns(1) 
The intuition is readily seen in the onedimensional () case, where the expectation of this expression equals
(2) 
which indeed approximates the derivative of (assuming is differentiable) at , if is small enough.
In contrast, our algorithm uses a slightly different estimator (also used in [1]), which queries at , and returns
(3) 
Again, the intuition is readily seen in the case , where the expectation of this expression also equals Eq. (2).
When is sufficiently small and is differentiable at , both estimators compute a good approximation of the true gradient . However, when
is not differentiable, the variance of the estimator in Eq. (
1) can be quadratic in the dimension , as pointed out by [4]: For example, for and, the second moment equals
Since the performance of the algorithm crucially depends on the second moment of the gradient estimate, this leads to a highly suboptimal guarantee. In [4], this was handled by adding an additional random perturbation and using a more involved analysis. Surprisingly, it turns out that the slightly different estimator in Eq. (3) does not suffer from this problem, and its second moment is essentially linear in the dimension .
2 Algorithm and Main Results
We consider the algorithm described in Figure 1, which performs standard mirror descent using a randomized gradient estimator of a (smoothed) version of at point . We make the assumption that one can indeed query at any point as specified in the algorithm^{2}^{2}2This may require us to query at a distance outside . If we must query within , then one can simply run the algorithm on a slightly smaller set , where for all , ensuring that we always query at . Since the formal guarantee in Thm. 1 holds for arbitrarily small , and each is Lipschitz, we can always take and small enough so that the additional regret/error incurred is negligible..
The analysis of the algorithm is presented in the following theorem:
Theorem 1.
Assume the following conditions hold:

is strongly convex with respect to a norm , and for some .

is convex and Lipschitz with respect to the norm .

The dual norm of is such that for some .
If , and chosen such that , then the sequence generated by the algorithm satisfies the following for any and :
where is some numerical constant.
We note that conditions 1 is standard in the analysis of the mirrordescent method (see the specific corollaries below), whereas conditions 2 and 3 are needed to ensure that the variance of our gradient estimator is controlled.
As mentioned earlier, the bound on the average regret which appears in Thm. 1 immediately implies a similar bound on the error in a stochastic optimization setting, for the average point . We note that the result is robust to the choice of , and is the same up to constants as long as . Also, the constant , while always bounded above zero, shrinks as (see the proof for details).
As a first application, let us consider the case where is the Euclidean norm . In this case, we can take , and the algorithm reduces to a standard variant of online gradient descent, defined as and . In this case, we get the following corollary:
Corollary 1.
Suppose for all is Lipschitz with respect to the Euclidean norm, and . Then using and , it holds for some constant and any that
The proof is immediately obtained from Thm. 1, noting that in our case. This bound matches (up to constants) the lower bound in [4], hence closing the gap between upper and lower bounds in this setting.
As a second application, let us consider the case where is the norm, , the domain is the simplex in , (although our result easily extends to any subset of the norm unit ball), and we use a standard entropic regularizer:
Corollary 2.
Suppose for all is Lipschitz with respect to the norm. Then using and , it holds for some constant and any that
This bound matches (this time up to a logarithmic factor) the lower bound in [4] for this setting .
Proof.
The function is strongly convex with respect to the norm (see for instance [9], Example 2.5), and has value at most on the simplex. Also, if is Lipschitz with respect to the norm, then it must be Lipschitz with respect to the Euclidean norm. Finally, to satisfy condition 3 in Thm. 1, we upper bound using the following lemma, whose proof is given in the appendix:
Lemma 1.
If
is uniformly distributed on the unit sphere in
, , then where is a positive numerical constant independent of .Plugging these observations into Thm. 1 leads to the desired result. ∎
3 Proof of Theorem 1
As discussed in the introduction, the key to getting improved results compared to previous papers is the use of a slightly different random gradient estimator, which turns out to have significantly less variance. The formal proof relies on a few simple lemmas listed below. The key lemma is Lemma 5, which establishes the improved variance behavior.
Lemma 2.
For any , it holds that
This lemma is the canonical result on the convergence of online mirror descent, and the proof is standard (see e.g. [9]).
Lemma 3.
Define the function
over , where is a vector picked uniformly at random from the Euclidean unit sphere. Then the function is convex, Lipschitz with constant , satisfies
and is differentiable with the following gradient:
Proof.
The fact that the function is convex and Lipschitz is immediate from its definition and the assumptions in the theorem. The inequality follows from being a unit vector and that is assumed to be Lipschitz with respect to the norm. The differentiability property follows from Lemma 2.1 in [5]. ∎
Lemma 4.
For any function which is Lipschitz with respect to the norm, it holds that if is uniformly distributed on the Euclidean unit sphere, then
for some numerical constant .
Proof.
A standard result on the concentration of Lipschitz functions on the Euclidean unit sphere implies that
for some numerical constant (see the proof of Proposition 2.10 and Corollary 2.6 in [7]). Therefore,
which equals for some numerical constant . ∎
Lemma 5.
It holds that (where is as defined in Lemma 3), and for some numerical constant .
Proof.
For simplicity of notation, we drop the subscript. Since has a symmetric distribution around the origin,
which equals by Lemma 3.
As to the second part of the lemma, we have the following, where is an arbitrary parameter and where we use the elementary inequality .
Again using the symmetrical distribution of , this equals
Applying CauchySchwartz and using the condition stated in the theorem, we get the upper bound
In particular, taking and using Lemma 4 (noting that is Lipschitz w.r.t. in terms of the norm), this is at most as required. ∎
We are now ready to prove the theorem. Taking expectations on both sides of the inequality in Lemma 2, we have
(4) 
Using Lemma 5, the right hand side is at most
The left hand side of Eq. (4), by Lemma 5 and convexity of , equals
By Lemma 3, this is at least
Combining these inequalities and plugging back into Eq. (4), we get
Choosing , and any , we get
Dividing both sides by , the result follows.
References
 [1] A. Agarwal, O. Dekel, and L. Xiao. Optimal algorithms for online convex optimization with multipoint bandit feedback. In COLT, 2010.
 [2] A. Barvinok. Measure concentration lecture notes. http://www.math.lsa.umich.edu/~barvinok/total710.pdf, 2005.
 [3] N. CesaBianchi, A. Conconi, and C. Gentile. On the generalization ability of online learning algorithms. Information Theory, IEEE Transactions on, 50(9):2050–2057, 2004.
 [4] J. Duchi, M. Jordan, M. Wainwright, and A. Wibisono. Optimal rates for zeroorder optimization: the power of two function evaluations. Information Theory, IEEE Transactions on, 61(5):2788–2806, May 2015.
 [5] A. Flaxman, A. Kalai, and B. McMahan. Online convex optimization in the bandit setting: gradient descent without a gradient. In SODA, 2005.
 [6] S. Ghadimi and G. Lan. Stochastic first and zerothorder methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4):2341–2368, 2013.
 [7] M. Ledoux. The concentration of measure phenomenon, volume 89. American Mathematical Soc., 2005.
 [8] Y. Nesterov. Random gradientfree minimization of convex functions. Technical Report 2011/16, ECORE, 2011.

[9]
S. ShalevShwartz.
Online learning and online convex optimization.
Foundations and Trends in Machine Learning
, 4(2), 2012.
Appendix A Proof of Lemma 1
We note that the distribution of is equivalent to that of , where is a standard Gaussian random vector. Moreover, by a standard concentration bound on the norm of Gaussian random vectors (e.g. Corollary 2.3 in [2], with ):
Finally, for any value of , we always have , since the Euclidean norm is always larger than the infinity norm. Combining these observations, and using for the indicator function of the event , we have
(5) 
Thus, it remains to upper bound where
is a standard Gaussian random variable. Letting
, and noting that are independent and identically distributed standard Gaussian random variables, we have for any scalar thatwhere is Bernoulli’s inequality, and is using a standard tail bound for a Gaussian random variable. In particular, the above implies that
Therefore, for an arbitrary positive scalar ,
In particular, plugging (which is larger than , since we assume ), we get . Plugging this back into Eq. (5), we get that
which can be shown to be at most for all , where is a numerical constant. In particular, this means that as required.
Comments
There are no comments yet.