An Optimal Algorithm for Stochastic and Adversarial Bandits

07/19/2018
by   Julian Zimmert, et al.
0

We provide an algorithm that achieves the optimal (up to constants) finite time regret in both adversarial and stochastic multi-armed bandits without prior knowledge of the regime and time horizon. The result provides a negative answer to the open problem of whether extra price has to be paid for the lack of information about the adversariality/stochasticity of the environment. We provide a complete characterization of online mirror descent algorithms based on Tsallis entropy and show that the power α = 1/2 achieves the goal. In addition, the proposed algorithm enjoys improved regret guarantees in two intermediate regimes: the moderately contaminated stochastic regime defined by Seldin and Slivkins (2014) and the stochastically constrained adversary studied by Wei and Luo (2018). The algorithm also obtains adversarial and stochastic optimality in the utility-based dueling bandit setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/19/2021

An Algorithm for Stochastic and Adversarial Bandits with Switching Costs

We propose an algorithm for stochastic and adversarial multiarmed bandit...
research
02/20/2017

An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits

We present a new strategy for gap estimation in randomized algorithms fo...
research
03/25/2018

Stochastic bandits robust to adversarial corruptions

We introduce a new model of stochastic bandits with adversarial corrupti...
research
02/11/2021

Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously

In this work, we develop linear bandit algorithms that automatically ada...
research
02/20/2023

A Blackbox Approach to Best of Both Worlds in Bandits and Beyond

Best-of-both-worlds algorithms for online learning which achieve near-op...
research
11/18/2015

Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits

I analyse the frequentist regret of the famous Gittins index strategy fo...
research
04/16/2018

UCBoost: A Boosting Approach to Tame Complexity and Optimality for Stochastic Bandits

In this work, we address the open problem of finding low-complexity near...

Please sign up or login with your details

Forgot password? Click here to reset