DeepAI
Log In Sign Up

Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits

11/18/2015
by   Tor Lattimore, et al.
0

I analyse the frequentist regret of the famous Gittins index strategy for multi-armed bandits with Gaussian noise and a finite horizon. Remarkably it turns out that this approach leads to finite-time regret guarantees comparable to those available for the popular UCB algorithm. Along the way I derive finite-time bounds on the Gittins index that are asymptotically exact and may be of independent interest. I also discuss some computational issues and present experimental results suggesting that a particular version of the Gittins index strategy is a modest improvement on existing algorithms with finite-time regret guarantees such as UCB and Thompson sampling.

READ FULL TEXT

page 1

page 2

page 3

page 4

03/29/2016

Regret Analysis of the Anytime Optimally Confident UCB Algorithm

I introduce and analyse an anytime version of the Optimally Confident UC...
03/08/2021

Efficient Algorithms for Finite Horizon and Streaming Restless Multi-Armed Bandit Problems

Restless Multi-Armed Bandits (RMABs) have been popularly used to model l...
03/19/2018

What Doubling Tricks Can and Can't Do for Multi-Armed Bandits

An online reinforcement learning algorithm is anytime if it does not nee...
09/03/2015

Sequential Design for Ranking Response Surfaces

We propose and analyze sequential design methods for the problem of rank...
08/18/2011

Doing Better Than UCT: Rational Monte Carlo Sampling in Trees

UCT, a state-of-the art algorithm for Monte Carlo tree sampling (MCTS), ...
10/01/2021

Batched Thompson Sampling

We introduce a novel anytime Batched Thompson sampling policy for multi-...
09/29/2021

Batched Bandits with Crowd Externalities

In Batched Multi-Armed Bandits (BMAB), the policy is not allowed to be u...