A minimax and asymptotically optimal algorithm for stochastic bandits

02/23/2017
by   Pierre Ménard, et al.
0

We propose the kl-UCB ++ algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the sense of Lai and Robbins' lower bound) and minimax optimal. This is the first algorithm proved to enjoy these two properties at the same time. This work thus merges two different lines of research with simple and clear proofs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/14/2018

KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints

In the context of K-armed stochastic bandits with distribution only assu...
research
06/30/2016

Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits

We study a generalization of the multi-armed bandit problem with multipl...
research
11/06/2015

Optimal Non-Asymptotic Lower Bound on the Minimax Regret of Learning with Expert Advice

We prove non-asymptotic lower bounds on the expectation of the maximum o...
research
03/16/2023

A bit-parallel tabu search algorithm for finding E(s^2)-optimal and minimax-optimal supersaturated designs

We prove the equivalence of two-symbol supersaturated designs (SSDs) wit...
research
03/19/2019

A Note on KL-UCB+ Policy for the Stochastic Bandit

A classic setting of the stochastic K-armed bandit problem is considered...
research
05/08/2018

Profitable Bandits

Originally motivated by default risk management applications, this paper...
research
06/21/2021

On Limited-Memory Subsampling Strategies for Bandits

There has been a recent surge of interest in nonparametric bandit algori...

Please sign up or login with your details

Forgot password? Click here to reset