A Note on the Equivalence of Upper Confidence Bounds and Gittins Indices for Patient Agents

04/09/2019
by   Daniel Russo, et al.
0

This note gives a short, self-contained, proof of a sharp connection between Gittins indices and Bayesian upper confidence bound algorithms. I consider a Gaussian multi-armed bandit problem with discount factor γ. The Gittins index of an arm is shown to equal the γ-quantile of the posterior distribution of the arm's mean plus an error term that vanishes as γ→ 1. In this sense, for sufficiently patient agents, a Gittins index measures the highest plausible mean-reward of an arm in a manner equivalent to an upper confidence bound.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/07/2012

UCB Algorithm for Exponential Distributions

We introduce in this paper a new algorithm for Multi-Armed Bandit (MAB) ...
research
12/27/2013

lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

The paper proposes a novel upper confidence bound (UCB) procedure for id...
research
11/22/2021

Decentralized Multi-Armed Bandit Can Outperform Classic Upper Confidence Bound

This paper studies a decentralized multi-armed bandit problem in a multi...
research
12/04/2020

One-bit feedback is sufficient for upper confidence bound policies

We consider a variant of the traditional multi-armed bandit problem in w...
research
01/31/2022

Rotting infinitely many-armed bandits

We consider the infinitely many-armed bandit problem with rotting reward...
research
06/04/2020

Differentiable Linear Bandit Algorithm

Upper Confidence Bound (UCB) is arguably the most commonly used method f...
research
12/30/2021

Reversible Upper Confidence Bound Algorithm to Generate Diverse Optimized Candidates

Most algorithms for the multi-armed bandit problem in reinforcement lear...

Please sign up or login with your details

Forgot password? Click here to reset