Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits

02/12/2022
by   Haipeng Luo, et al.
0

We consider the problem of combining and learning over a set of adversarial bandit algorithms with the goal of adaptively tracking the best one on the fly. The CORRAL algorithm of Agarwal et al. (2017) and its variants (Foster et al., 2020a) achieve this goal with a regret overhead of order O(√(MT)) where M is the number of base algorithms and T is the time horizon. The polynomial dependence on M, however, prevents one from applying these algorithms to many applications where M is poly(T) or even larger. Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only logarithmic dependence on M as long as some conditions are satisfied. As the main example, we apply our recipe to the problem of adversarial linear bandits over a d-dimensional ℓ_p unit-ball for p ∈ (1,2]. By corralling a large set of T base algorithms, each starting at a different time step, our final algorithm achieves the first optimal switching regret O(√(d S T)) when competing against a sequence of comparators with S switches (for some known S). We further extend our results to linear bandits over a smooth and strongly convex domain as well as unconstrained linear bandits.

READ FULL TEXT
research
06/07/2022

Better Best of Both Worlds Bounds for Bandits with Switching Costs

We study best-of-both-worlds algorithms for bandits with switching cost,...
research
02/04/2022

An Experimental Design Approach for Regret Minimization in Logistic Bandits

In this work we consider the problem of regret minimization for logistic...
research
09/02/2023

Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual Bandits

We consider the adversarial linear contextual bandit problem, where the ...
research
09/04/2019

Stochastic Linear Optimization with Adversarial Corruption

We extend the model of stochastic bandits with adversarial corruption (L...
research
11/05/2021

Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs

In online learning problems, exploiting low variance plays an important ...
research
02/10/2023

A Second-Order Method for Stochastic Bandit Convex Optimisation

We introduce a simple and efficient algorithm for unconstrained zeroth-o...
research
09/30/2022

Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits

While standard bandit algorithms sometimes incur high regret, their perf...

Please sign up or login with your details

Forgot password? Click here to reset