Log In Sign Up

Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits

by   Tor Lattimore, et al.

I analyse the frequentist regret of the famous Gittins index strategy for multi-armed bandits with Gaussian noise and a finite horizon. Remarkably it turns out that this approach leads to finite-time regret guarantees comparable to those available for the popular UCB algorithm. Along the way I derive finite-time bounds on the Gittins index that are asymptotically exact and may be of independent interest. I also discuss some computational issues and present experimental results suggesting that a particular version of the Gittins index strategy is a modest improvement on existing algorithms with finite-time regret guarantees such as UCB and Thompson sampling.


page 1

page 2

page 3

page 4


Regret Analysis of the Anytime Optimally Confident UCB Algorithm

I introduce and analyse an anytime version of the Optimally Confident UC...

Efficient Algorithms for Finite Horizon and Streaming Restless Multi-Armed Bandit Problems

Restless Multi-Armed Bandits (RMABs) have been popularly used to model l...

What Doubling Tricks Can and Can't Do for Multi-Armed Bandits

An online reinforcement learning algorithm is anytime if it does not nee...

Sequential Design for Ranking Response Surfaces

We propose and analyze sequential design methods for the problem of rank...

Doing Better Than UCT: Rational Monte Carlo Sampling in Trees

UCT, a state-of-the art algorithm for Monte Carlo tree sampling (MCTS), ...

Batched Thompson Sampling

We introduce a novel anytime Batched Thompson sampling policy for multi-...

Batched Bandits with Crowd Externalities

In Batched Multi-Armed Bandits (BMAB), the policy is not allowed to be u...