No-Regret Linear Bandits beyond Realizability

02/26/2023
by   Chong Liu, et al.
0

We study linear bandits when the underlying reward function is not linear. Existing work relies on a uniform misspecification parameter ϵ that measures the sup-norm error of the best linear approximation. This results in an unavoidable linear regret whenever ϵ > 0. We describe a more natural model of misspecification which only requires the approximation error at each input x to be proportional to the suboptimality gap at x. It captures the intuition that, for optimization problems, near-optimal regions should matter more and we can tolerate larger approximation errors in suboptimal regions. Quite surprisingly, we show that the classical LinUCB algorithm – designed for the realizable case – is automatically robust against such gap-adjusted misspecification. It achieves a near-optimal √(T) regret for problems that the best-known regret is almost linear in time horizon T. Technically, our proof relies on a novel self-bounding argument that bounds the part of the regret due to misspecification by the regret itself.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/12/2019

Distributed Bandit Learning: How Much Communication is Needed to Achieve (Near) Optimal Regret

We study the communication complexity of distributed multi-armed bandits...
research
02/08/2021

Near-optimal Representation Learning for Linear Bandits and Linear RL

This paper studies representation learning for multi-task linear bandits...
research
02/11/2013

Adaptive-treed bandits

We describe a novel algorithm for noisy global optimisation and continuu...
research
03/23/2022

Minimax Regret for Cascading Bandits

Cascading bandits model the task of learning to rank K out of L items ov...
research
09/02/2023

Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual Bandits

We consider the adversarial linear contextual bandit problem, where the ...
research
07/07/2020

Stochastic Linear Bandits Robust to Adversarial Attacks

We consider a stochastic linear bandit problem in which the rewards are ...
research
11/18/2019

Learning with Good Feature Representations in Bandits and in RL with a Generative Model

The construction in the recent paper by Du et al. [2019] implies that se...

Please sign up or login with your details

Forgot password? Click here to reset