Adapting to misspecification in contextual bandits with offline regression oracles

Computationally efficient contextual bandits are often based on estimating a predictive model of rewards given contexts and arms using past data. However, when the reward model is not well-specified, the bandit algorithm may incur unexpected regret, so recent work has focused on algorithms that are robust to misspecification. We propose a simple family of contextual bandit algorithms that adapt to misspecification error by reverting to a good safe policy when there is evidence that misspecification is causing a regret increase. Our algorithm requires only an offline regression oracle to ensure regret guarantees that gracefully degrade in terms of a measure of the average misspecification level. Compared to prior work, we attain similar regret guarantees, but we do no rely on a master algorithm, and do not require more robust oracles like online or constrained regression oracles (e.g., Foster et al. (2020a); Krishnamurthy et al. (2020)). This allows us to design algorithms for more general function approximation classes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2021

Adapting to Misspecification in Contextual Bandits

A major research direction in contextual bandits is to develop algorithm...
research
06/28/2018

Contextual bandits with surrogate losses: Margin bounds and efficient algorithms

We introduce a new family of margin-based regret guarantees for adversar...
research
10/24/2022

Conditionally Risk-Averse Contextual Bandits

We desire to apply contextual bandits to scenarios where average-case st...
research
10/25/2020

Tractable contextual bandits beyond realizability

Tractable contextual bandit algorithms often rely on the realizability a...
research
06/21/2021

On Limited-Memory Subsampling Strategies for Bandits

There has been a recent surge of interest in nonparametric bandit algori...
research
07/12/2022

Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces

Designing efficient general-purpose contextual bandit algorithms that wo...
research
09/07/2022

Dual Instrumental Method for Confounded Kernelized Bandits

The contextual bandit problem is a theoretically justified framework wit...

Please sign up or login with your details

Forgot password? Click here to reset