Contextual Multi-armed Bandits under Feature Uncertainty

03/03/2017
by   Se-Young Yun, et al.
0

We study contextual multi-armed bandit problems under linear realizability on rewards and uncertainty (or noise) on features. For the case of identical noise on features across actions, we propose an algorithm, coined NLinRel, having O(T^7/8((dT)+K√(d))) regret bound for T rounds, K actions, and d-dimensional feature vectors. Next, for the case of non-identical noise, we observe that popular linear hypotheses including NLinRel are impossible to achieve such sub-linear regret. Instead, under assumption of Gaussian feature vectors, we prove that a greedy algorithm has O(T^2/3√( d)) regret bound with respect to the optimal linear hypothesis. Utilizing our theoretical understanding on the Gaussian case, we also design a practical variant of NLinRel, coined Universal-NLinRel, for arbitrary feature distributions. It first runs NLinRel for finding the `true' coefficient vector using feature uncertainties and then adjust it to minimize its regret using the statistical feature information. We justify the performance of Universal-NLinRel on both synthetic and real-world datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2019

OSOM: A Simultaneously Optimal Algorithm for Multi-Armed and Linear Contextual Bandits

We consider the stochastic linear (multi-armed) contextual bandit proble...
research
10/11/2019

Old Dog Learns New Tricks: Randomized UCB for Bandit Problems

We propose RandUCB, a bandit strategy that uses theoretically derived co...
research
08/10/2020

Lenient Regret for Multi-Armed Bandits

We consider the Multi-Armed Bandit (MAB) problem, where the agent sequen...
research
01/04/2021

Be Greedy in Multi-Armed Bandits

The Greedy algorithm is the simplest heuristic in sequential decision pr...
research
12/31/2022

Contextual Bandits and Optimistically Universal Learning

We consider the contextual bandit problem on general action and context ...
research
05/29/2023

Contextual Bandits with Budgeted Information Reveal

Contextual bandit algorithms are commonly used in digital health to reco...
research
01/23/2023

Congested Bandits: Optimal Routing via Short-term Resets

For traffic routing platforms, the choice of which route to recommend to...

Please sign up or login with your details

Forgot password? Click here to reset