Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits

11/18/2015
by   Tor Lattimore, et al.
0

I analyse the frequentist regret of the famous Gittins index strategy for multi-armed bandits with Gaussian noise and a finite horizon. Remarkably it turns out that this approach leads to finite-time regret guarantees comparable to those available for the popular UCB algorithm. Along the way I derive finite-time bounds on the Gittins index that are asymptotically exact and may be of independent interest. I also discuss some computational issues and present experimental results suggesting that a particular version of the Gittins index strategy is a modest improvement on existing algorithms with finite-time regret guarantees such as UCB and Thompson sampling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2016

Regret Analysis of the Anytime Optimally Confident UCB Algorithm

I introduce and analyse an anytime version of the Optimally Confident UC...
research
03/19/2018

What Doubling Tricks Can and Can't Do for Multi-Armed Bandits

An online reinforcement learning algorithm is anytime if it does not nee...
research
03/08/2021

Efficient Algorithms for Finite Horizon and Streaming Restless Multi-Armed Bandit Problems

Restless Multi-Armed Bandits (RMABs) have been popularly used to model l...
research
09/03/2015

Sequential Design for Ranking Response Surfaces

We propose and analyze sequential design methods for the problem of rank...
research
07/19/2018

An Optimal Algorithm for Stochastic and Adversarial Bandits

We provide an algorithm that achieves the optimal (up to constants) fini...
research
12/13/2021

Stochastic differential equations for limiting description of UCB rule for Gaussian multi-armed bandits

We consider the upper confidence bound strategy for Gaussian multi-armed...
research
10/01/2021

Batched Thompson Sampling

We introduce a novel anytime Batched Thompson sampling policy for multi-...

Please sign up or login with your details

Forgot password? Click here to reset