Stability-penalty-adaptive Follow-the-regularized-leader: Sparsity, Game-dependency, and Best-of-both-worlds

05/26/2023
by   Taira Tsuchiya, et al.
0

Adaptivity to the difficulties of a problem is a key property in sequential decision-making problems to broaden the applicability of algorithms. Follow-the-Regularized-Leader (FTRL) has recently emerged as one of the most promising approaches for obtaining various types of adaptivity in bandit problems. Aiming to further generalize this adaptivity, we develop a generic adaptive learning rate, called Stability-Penalty-Adaptive (SPA) learning rate for FTRL. This learning rate yields a regret bound jointly depending on stability and penalty of the algorithm, into which the regret of FTRL is typically decomposed. With this result, we establish several algorithms with three types of adaptivity: sparsity, game-dependency, and Best-of-Both-Worlds (BOBW). Sparsity frequently appears in real-world problems. However, existing sparse multi-armed bandit algorithms with k-arms assume that the sparsity level s ≤ k is known in advance, which is often not the case in real-world scenarios. To address this problem, with the help of the new learning rate framework, we establish s-agnostic algorithms with regret bounds of Õ(√(sT)) in the adversarial regime for T rounds, which matches the existing lower bound up to a logarithmic factor. Meanwhile, BOBW algorithms aim to achieve a near-optimal regret in both the stochastic and adversarial regimes. Leveraging the new adaptive learning rate framework and a novel analysis to bound the variation in FTRL output in response to changes in a regularizer, we establish the first BOBW algorithm with a sparsity-dependent bound. Additionally, we explore partial monitoring and demonstrate that the proposed learning rate framework allows us to achieve a game-dependent bound and the BOBW simultaneously.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2021

Scale Free Adversarial Multi Armed Bandits

We consider the Scale-Free Adversarial Multi Armed Bandit(MAB) problem, ...
research
09/22/2021

On Optimal Robustness to Adversarial Corruption in Online Decision Problems

This paper considers two fundamental sequential decision-making problems...
research
01/10/2018

More Adaptive Algorithms for Adversarial Bandits

We develop a novel and generic algorithm for the adversarial multi-armed...
research
03/13/2023

Best-of-three-worlds Analysis for Linear Bandits with Follow-the-regularized-leader Algorithm

The linear bandit problem has been studied for many years in both stocha...
research
09/05/2018

Anytime Hedge achieves optimal regret in the stochastic regime

This paper is about a surprising fact: we prove that the anytime Hedge a...
research
02/27/2023

Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms

We study the problem of designing adaptive multi-armed bandit algorithms...
research
07/29/2022

Best-of-Both-Worlds Algorithms for Partial Monitoring

This paper considers the partial monitoring problem with k-actions and d...

Please sign up or login with your details

Forgot password? Click here to reset