Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds

06/14/2022
by   Shinji Ito, et al.
0

This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BOBW) algorithm that works nearly optimally in both stochastic and adversarial settings. In stochastic settings, some existing BOBW algorithms achieve tight gap-dependent regret bounds of O(∑_i: Δ_i>0log T/Δ_i) for suboptimality gap Δ_i of arm i and time horizon T. As Audibert et al. [2007] have shown, however, that the performance can be improved in stochastic environments with low-variance arms. In fact, they have provided a stochastic MAB algorithm with gap-variance-dependent regret bounds of O(∑_i: Δ_i>0 (σ_i^2/Δ_i + 1) log T ) for loss variance σ_i^2 of arm i. In this paper, we propose the first BOBW algorithm with gap-variance-dependent bounds, showing that the variance information can be used even in the possibly adversarial environment. Further, the leading constant factor in our gap-variance dependent bound is only (almost) twice the value for the lower bound. Additionally, the proposed algorithm enjoys multiple data-dependent regret bounds in adversarial settings and works well in stochastic settings with adversarial corruptions. The proposed algorithm is based on the follow-the-regularized-leader method and employs adaptive learning rates that depend on the empirical prediction error of the loss, which leads to gap-variance-dependent regret bounds reflecting the variance of the arms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/22/2021

On Optimal Robustness to Adversarial Corruption in Online Decision Problems

This paper considers two fundamental sequential decision-making problems...
research
02/24/2023

Best-of-Three-Worlds Linear Bandit Algorithm with Variance-Adaptive Regret Bounds

This paper proposes a linear bandit algorithm that is adaptive to enviro...
research
01/10/2018

More Adaptive Algorithms for Adversarial Bandits

We develop a novel and generic algorithm for the adversarial multi-armed...
research
02/27/2023

Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms

We study the problem of designing adaptive multi-armed bandit algorithms...
research
03/19/2019

Adaptivity, Variance and Separation for Adversarial Bandits

We make three contributions to the theory of k-armed adversarial bandits...
research
11/19/2020

Fully Gap-Dependent Bounds for Multinomial Logit Bandit

We study the multinomial logit (MNL) bandit problem, where at each time ...
research
03/13/2023

Best-of-three-worlds Analysis for Linear Bandits with Follow-the-regularized-leader Algorithm

The linear bandit problem has been studied for many years in both stocha...

Please sign up or login with your details

Forgot password? Click here to reset