An Optimal Algorithm for Linear Bandits

10/19/2011
by   Nicolò Cesa-Bianchi, et al.
0

We provide the first algorithm for online bandit linear optimization whose regret after T rounds is of order sqrtTd ln N on any finite class X of N actions in d dimensions, and of order d*sqrtT (up to log factors) when X is infinite. These bounds are not improvable in general. The basic idea utilizes tools from convex geometry to construct what is essentially an optimal exploration basis. We also present an application to a model of linear bandits with expert advice. Interestingly, these results show that bandit linear optimization with expert advice in d dimensions is no more difficult (in terms of the achievable regret) than the online d-armed bandit problem with expert advice (where EXP4 is optimal).

READ FULL TEXT
research
02/14/2012

Towards minimax policies for online linear optimization with bandit feedback

We address the online linear optimization problem with bandit feedback. ...
research
12/21/2013

Volumetric Spanners: an Efficient Exploration Basis for Learning

Numerous machine learning problems require an exploration basis - a mech...
research
03/10/2021

Linear Bandits on Uniformly Convex Sets

Linear bandit algorithms yield 𝒪̃(n√(T)) pseudo-regret bounds on compact...
research
10/23/2019

Diversifying Database Activity Monitoring with Bandits

Database activity monitoring (DAM) systems are commonly used by organiza...
research
08/29/2023

Exploiting Problem Geometry in Safe Linear Bandits

The safe linear bandit problem is a version of the classic linear bandit...
research
05/12/2021

High-Dimensional Experimental Design and Kernel Bandits

In recent years methods from optimal linear experimental design have bee...
research
10/05/2020

Diversity-Preserving K-Armed Bandits, Revisited

We consider the bandit-based framework for diversity-preserving recommen...

Please sign up or login with your details

Forgot password? Click here to reset