Best-of-Three-Worlds Linear Bandit Algorithm with Variance-Adaptive Regret Bounds

02/24/2023
by   Shinji Ito, et al.
0

This paper proposes a linear bandit algorithm that is adaptive to environments at two different levels of hierarchy. At the higher level, the proposed algorithm adapts to a variety of types of environments. More precisely, it achieves best-of-three-worlds regret bounds, i.e., of O(√(T log T)) for adversarial environments and of O(log T/Δ_min + √(C log T/Δ_min)) for stochastic environments with adversarial corruptions, where T, Δ_min, and C denote, respectively, the time horizon, the minimum sub-optimality gap, and the total amount of the corruption. Note that polynomial factors in the dimensionality are omitted here. At the lower level, in each of the adversarial and stochastic regimes, the proposed algorithm adapts to certain environmental characteristics, thereby performing better. The proposed algorithm has data-dependent regret bounds that depend on all of the cumulative loss for the optimal action, the total quadratic variation, and the path-length of the loss vector sequence. In addition, for stochastic environments, the proposed algorithm has a variance-adaptive regret bound of O(σ^2 log T/Δ_min) as well, where σ^2 denotes the maximum variance of the feedback loss. The proposed algorithm is based on the SCRiBLe algorithm. By incorporating into this a new technique we call scaled-up sampling, we obtain high-level adaptability, and by incorporating the technique of optimistic online learning, we obtain low-level adaptability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2022

Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds

This paper considers the multi-armed bandit (MAB) problem and provides a...
research
06/02/2022

Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs

This study considers online learning with general directed feedback grap...
research
02/15/2022

Between Stochastic and Adversarial Online Convex Optimization: Improved Regret Bounds via Smoothness

Stochastic and adversarial data are two widely studied settings in onlin...
research
03/06/2023

Accelerated Rates between Stochastic and Adversarial Online Convex Optimization

Stochastic and adversarial data are two widely studied settings in onlin...
research
10/25/2021

Linear Contextual Bandits with Adversarial Corruptions

We study the linear contextual bandit problem in the presence of adversa...
research
03/19/2019

Adaptivity, Variance and Separation for Adversarial Bandits

We make three contributions to the theory of k-armed adversarial bandits...
research
07/29/2022

Best-of-Both-Worlds Algorithms for Partial Monitoring

This paper considers the partial monitoring problem with k-actions and d...

Please sign up or login with your details

Forgot password? Click here to reset