Minimax Optimal Algorithms for Adversarial Bandit Problem with Multiple Plays

11/25/2019
by   N. Mert Vural, et al.
0

We investigate the adversarial bandit problem with multiple plays under semi-bandit feedback. We introduce a highly efficient algorithm that asymptotically achieves the performance of the best switching m-arm strategy with minimax optimal regret bounds. To construct our algorithm, we introduce a new expert advice algorithm for the multiple-play setting. By using our expert advice algorithm, we additionally improve the best-known high-probability bound for the multi-play setting by O(√(m)). Our results are guaranteed to hold in an individual sequence manner since we have no statistical assumption on the bandit arm gains. Through an extensive set of experiments involving synthetic and real data, we demonstrate significant performance gains achieved by the proposed algorithm with respect to the state-of-the-art algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2022

Dueling Bandits: From Two-dueling to Multi-dueling

We study a general multi-dueling bandit problem, where an agent compares...
research
07/13/2020

Relaxing the I.I.D. Assumption: Adaptive Minimax Optimal Sequential Prediction with Expert Advice

We consider sequential prediction with expert advice when the data are g...
research
07/27/2023

Adversarial Sleeping Bandit Problems with Multiple Plays: Algorithm and Ranking Application

This paper presents an efficient algorithm to solve the sleeping bandit ...
research
02/14/2012

Towards minimax policies for online linear optimization with bandit feedback

We address the online linear optimization problem with bandit feedback. ...
research
12/07/2020

Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition

We study the stochastic shortest path problem with adversarial costs and...
research
04/18/2018

Online Non-Additive Path Learning under Full and Partial Information

We consider the online path learning problem in a graph with non-additiv...
research
09/28/2020

Position-Based Multiple-Play Bandits with Thompson Sampling

Multiple-play bandits aim at displaying relevant items at relevant posit...

Please sign up or login with your details

Forgot password? Click here to reset