A Blackbox Approach to Best of Both Worlds in Bandits and Beyond

02/20/2023
by   Christoph Dann, et al.
0

Best-of-both-worlds algorithms for online learning which achieve near-optimal regret in both the adversarial and the stochastic regimes have received growing attention recently. Existing techniques often require careful adaptation to every new problem setup, including specialised potentials and careful tuning of algorithm parameters. Yet, in domains such as linear bandits, it is still unknown if there exists an algorithm that can simultaneously obtain O(log(T)) regret in the stochastic regime and Õ(√(T)) regret in the adversarial regime. In this work, we resolve this question positively and present a general reduction from best of both worlds to a wide family of follow-the-regularized-leader (FTRL) and online-mirror-descent (OMD) algorithms. We showcase the capability of this reduction by transforming existing algorithms that are only known to achieve worst-case guarantees into new algorithms with best-of-both-worlds guarantees in contextual bandits, graph bandits and tabular Markov decision processes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/15/2023

A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs

We derive a new analysis of Follow The Regularized Leader (FTRL) for onl...
research
02/20/2017

An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits

We present a new strategy for gap estimation in randomized algorithms fo...
research
07/19/2018

An Optimal Algorithm for Stochastic and Adversarial Bandits

We provide an algorithm that achieves the optimal (up to constants) fini...
research
06/10/2020

Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition

This work studies the problem of learning episodic Markov Decision Proce...
research
07/29/2022

Best-of-Both-Worlds Algorithms for Partial Monitoring

This paper considers the partial monitoring problem with k-actions and d...
research
01/03/2013

Follow the Leader If You Can, Hedge If You Must

Follow-the-Leader (FTL) is an intuitive sequential prediction strategy t...
research
02/28/2023

Approximately Stationary Bandits with Knapsacks

Bandits with Knapsacks (BwK), the generalization of the Multi-Armed Band...

Please sign up or login with your details

Forgot password? Click here to reset