lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

12/27/2013
by   Kevin Jamieson, et al.
0

The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples. The procedure cannot be improved in the sense that the number of samples required to identify the best arm is within a constant factor of a lower bound based on the law of the iterated logarithm (LIL). Inspired by the LIL, we construct our confidence bounds to explicitly account for the infinite time horizon of the algorithm. In addition, by using a novel stopping time for the algorithm we avoid a union bound over the arms that has been observed in other UCB-type algorithms. We prove that the algorithm is optimal up to constants and also show through simulations that it provides superior performance with respect to the state-of-the-art.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2013

On Finding the Largest Mean Among Many

Sampling from distributions to find the one with the largest mean arises...
research
06/12/2021

Guaranteed Fixed-Confidence Best Arm Identification in Multi-Armed Bandit

We consider the problem of finding, through adaptive sampling, which of ...
research
06/22/2023

Pure Exploration in Bandits with Linear Constraints

We address the problem of identifying the optimal policy with a fixed co...
research
02/13/2022

On the complexity of All ε-Best Arms Identification

We consider the problem introduced by <cit.> of identifying all the ε-op...
research
04/09/2019

A Note on the Equivalence of Upper Confidence Bounds and Gittins Indices for Patient Agents

This note gives a short, self-contained, proof of a sharp connection bet...
research
10/31/2020

Resource Allocation in Multi-armed Bandit Exploration: Overcoming Nonlinear Scaling with Adaptive Parallelism

We study exploration in stochastic multi-armed bandits when we have acce...
research
02/18/2020

Intelligent and Reconfigurable Architecture for KL Divergence Based Online Machine Learning Algorithm

Online machine learning (OML) algorithms do not need any training phase ...

Please sign up or login with your details

Forgot password? Click here to reset