OSOM: A Simultaneously Optimal Algorithm for Multi-Armed and Linear Contextual Bandits

05/24/2019
by   Niladri S. Chatterji, et al.
0

We consider the stochastic linear (multi-armed) contextual bandit problem with the possibility of hidden simple multi-armed bandit structure in which the rewards are independent of the contextual information. Algorithms that are designed solely for one of the regimes are known to be sub-optimal for their alternate regime. We design a single computationally efficient algorithm that simultaneously obtains problem-dependent optimal regret rates in the simple multi-armed bandit regime and minimax optimal regret rates in the linear contextual bandit regime, without knowing a priori which of the two models generates the rewards. These results are proved under the condition of stochasticity of contextual information over multiple rounds. Our results should be viewed as a step towards principled data-dependent policy class selection for contextual bandits.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2012

Thompson Sampling for Contextual Bandits with Linear Payoffs

Thompson Sampling is one of the oldest heuristics for multi-armed bandit...
research
01/25/2019

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously

We develop the first general semi-bandit algorithm that simultaneously a...
research
01/23/2023

Congested Bandits: Optimal Routing via Short-term Resets

For traffic routing platforms, the choice of which route to recommend to...
research
03/03/2017

Contextual Multi-armed Bandits under Feature Uncertainty

We study contextual multi-armed bandit problems under linear realizabili...
research
10/11/2018

Regularized Contextual Bandits

We consider the stochastic contextual bandit problem with additional reg...
research
06/04/2021

Fair Exploration via Axiomatic Bargaining

Motivated by the consideration of fairly sharing the cost of exploration...
research
10/12/2019

What You See May Not Be What You Get: UCB Bandit Algorithms Robust to ε-Contamination

Motivated by applications of bandit algorithms in education, we consider...

Please sign up or login with your details

Forgot password? Click here to reset