Bandit Learning with General Function Classes: Heteroscedastic Noise and Variance-dependent Regret Bounds

02/28/2022
by   Heyang Zhao, et al.
1

We consider learning a stochastic bandit model, where the reward function belongs to a general class of uniformly bounded functions, and the additive noise can be heteroscedastic. Our model captures contextual linear bandits and generalized linear bandits as special cases. While previous works (Kirschner and Krause, 2018; Zhou et al., 2021) based on weighted ridge regression can deal with linear bandits with heteroscedastic noise, they are not directly applicable to our general model due to the curse of nonlinearity. In order to tackle this problem, we propose a multi-level learning framework for the general bandit model. The core idea of our framework is to partition the observed data into different levels according to the variance of their respective reward and perform online learning at each level collaboratively. Under our framework, we first design an algorithm that constructs the variance-aware confidence set based on empirical risk minimization and prove a variance-dependent regret bound. For generalized linear bandits, we further propose an algorithm based on follow-the-regularized-leader (FTRL) subroutine and online-to-confidence-set conversion, which can achieve a tighter variance-dependent regret under certain conditions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2021

Linear Contextual Bandits with Adversarial Corruptions

We study the linear contextual bandit problem in the presence of adversa...
research
03/16/2023

On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits

We study linear contextual bandits in the misspecified setting, where th...
research
05/26/2022

Variance-Aware Sparse Linear Bandits

It is well-known that the worst-case minimax regret for sparse linear ba...
research
02/14/2011

Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems

The analysis of online least squares estimation is at the heart of many ...
research
11/23/2020

Improved Confidence Bounds for the Linear Logistic Model and Applications to Linear Bandits

We propose improved fixed-design confidence bounds for the linear logist...
research
03/12/2023

Data Dependent Regret Guarantees Against General Comparators for Full or Bandit Feedback

We study the adversarial online learning problem and create a completely...
research
08/04/2022

Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing

Motivated by practical considerations in machine learning for financial ...

Please sign up or login with your details

Forgot password? Click here to reset