Explore no more: Improved high-probability regret bounds for non-stochastic bandits

06/10/2015
by   Gergely Neu, et al.
0

This work addresses the problem of regret minimization in non-stochastic multi-armed bandit problems, focusing on performance guarantees that hold with high probability. Such results are rather scarce in the literature since proving them requires a large deal of technical effort and significant modifications to the standard, more intuitive algorithms that come only with guarantees that hold on expectation. One of these modifications is forcing the learner to sample arms from the uniform distribution at least Ω(√(T)) times over T rounds, which can adversely affect performance if many of the arms are suboptimal. While it is widely conjectured that this property is essential for proving high-probability regret bounds, we show in this paper that it is possible to achieve such strong results without this undesirable exploration component. Our result relies on a simple and intuitive loss-estimation strategy called Implicit eXploration (IX) that allows a remarkably clean analysis. To demonstrate the flexibility of our technique, we derive several improved high-probability bounds for various extensions of the standard multi-armed bandit framework. Finally, we conduct a simple experiment that illustrates the robustness of our implicit exploration technique.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/26/2019

Perturbed-History Exploration in Stochastic Multi-Armed Bandits

We propose an online algorithm for cumulative regret minimization in a s...
research
10/12/2020

Adapting to Delays and Data in Adversarial Multi-Armed Bandits

We consider the adversarial multi-armed bandit problem under delayed fee...
research
11/08/2020

Cooperative and Stochastic Multi-Player Multi-Armed Bandit: Optimal Regret With Neither Communication Nor Collisions

We consider the cooperative multi-player version of the stochastic multi...
research
10/04/2022

Reproducible Bandits

In this paper, we introduce the notion of reproducible policies in the c...
research
05/29/2017

Boltzmann Exploration Done Right

Boltzmann exploration is a classic strategy for sequential decision-maki...
research
12/01/2013

Stochastic continuum armed bandit problem of few linear parameters in high dimensions

We consider a stochastic continuum armed bandit problem where the arms a...
research
05/04/2021

Optimal Algorithms for Range Searching over Multi-Armed Bandits

This paper studies a multi-armed bandit (MAB) version of the range-searc...

Please sign up or login with your details

Forgot password? Click here to reset