Kernel-based methods for bandit convex optimization

07/11/2016
by   Sébastien Bubeck, et al.
0

We consider the adversarial convex bandit problem and we build the first poly(T)-time algorithm with poly(n) √(T)-regret for this problem. To do so we introduce three new ideas in the derivative-free optimization literature: (i) kernel methods, (ii) a generalization of Bernoulli convolutions, and (iii) a new annealing schedule for exponential weights (with increasing learning rate). The basic version of our algorithm achieves Õ(n^9.5√(T))-regret, and we show that a simple variant of this algorithm can be run in poly(n (T))-time per step at the cost of an additional poly(n) T^o(1) factor in the regret. These results improve upon the Õ(n^11√(T))-regret and (poly(T))-time result of the first two authors, and the (T)^poly(n)√(T)-regret and (T)^poly(n)-time result of Hazan and Li. Furthermore we conjecture that another variant of the algorithm could achieve Õ(n^1.5√(T))-regret, and moreover that this regret is unimprovable (the current best lower bound being Ω(n √(T)) and it is achieved with linear functions). For the simpler situation of zeroth order stochastic convex optimization this corresponds to the conjecture that the optimal query complexity is of order n^3 / ϵ^2.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/21/2017

Regret Analysis for Continuous Dueling Bandit

The dueling bandit is a learning framework wherein the feedback informat...
research
10/25/2020

Geometric Exploration for Online Control

We study the control of an unknown linear dynamical system under general...
research
02/01/2023

Bandit Convex Optimisation Revisited: FTRL Achieves Õ(t^1/2) Regret

We show that a kernel estimator using multiple function evaluations can ...
research
05/18/2018

Projection-Free Bandit Convex Optimization

In this paper, we propose the first computationally efficient projection...
research
05/19/2022

Breaking the √(T) Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear Bandits

We prove an instance independent (poly) logarithmic regret for stochasti...
research
06/13/2018

Minimizing Regret in Bandit Online Optimization in Unconstrained and Constrained Action Spaces

We consider online convex optimization with zeroth-order feedback settin...
research
06/09/2021

Contextual Recommendations and Low-Regret Cutting-Plane Algorithms

We consider the following variant of contextual linear bandits motivated...

Please sign up or login with your details

Forgot password? Click here to reset