The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits

10/14/2016
by   Tor Lattimore, et al.
0

Stochastic linear bandits are a natural and simple generalisation of finite-armed bandits with numerous practical applications. Current approaches focus on generalising existing techniques for finite-armed bandits, notably the optimism principle and Thompson sampling. While prior work has mostly been in the worst-case setting, we analyse the asymptotic instance-dependent regret and show matching upper and lower bounds on what is achievable. Surprisingly, our results show that no algorithm based on optimism or Thompson sampling will ever achieve the optimal rate, and indeed, can be arbitrarily far from optimal, even in very simple cases. This is a disturbing result because these techniques are standard tools that are widely used for sequential optimisation. For example, for generalised linear bandits and reinforcement learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2013

Adaptive-treed bandits

We describe a novel algorithm for noisy global optimisation and continuu...
research
02/01/2020

Advances in Bandits with Knapsacks

"Bandits with Knapsacks" () is a general model for multi-armed bandits u...
research
04/24/2023

Instance-Optimality in Interactive Decision Making: Toward a Non-Asymptotic Theory

We consider the development of adaptive, instance-dependent algorithms f...
research
02/03/2020

Sample Complexity of Incentivized Exploration

We consider incentivized exploration: a version of multi-armed bandits w...
research
06/06/2022

Asymptotic Instance-Optimal Algorithms for Interactive Decision Making

Past research on interactive decision making problems (bandits, reinforc...
research
02/07/2023

Linear Partial Monitoring for Sequential Decision-Making: Algorithms, Regret Bounds and Applications

Partial monitoring is an expressive framework for sequential decision-ma...
research
05/24/2019

Polynomial Cost of Adaptation for X -Armed Bandits

In the context of stochastic continuum-armed bandits, we present an algo...

Please sign up or login with your details

Forgot password? Click here to reset