Stochastic Bandits with Context Distributions

06/06/2019
by   Johannes Kirschner, et al.
6

We introduce a novel stochastic contextual bandit model, where at each step the adversary chooses a distribution over a context set. The learner observes only the context distribution while the exact context realization remains hidden. This allows for a broader range of applications, for instance when the context itself is based on predictions. By leveraging the UCB algorithm to this setting, we propose an algorithm that achieves a Õ(d√(T)) high-probability regret bound for linearly parametrized reward functions. Our results strictly generalize previous work in the sense that both our model and the algorithm reduce to the standard setting when the environment chooses only Dirac delta distributions and therefore provides the exact context to the learner. We further obtain similar results for a variant where the learner observes the realized context after choosing the action, and we extend the results to the kernelized setting. Finally, we demonstrate the proposed method on synthetic and real-world datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/28/2022

Distributed Stochastic Bandit Learning with Context Distributions

We study the problem of distributed stochastic multi-arm contextual band...
research
03/29/2022

Stochastic Conservative Contextual Linear Bandits

Many physical systems have underlying safety considerations that require...
research
07/24/2023

Contextual Bandits and Imitation Learning via Preference-Based Active Queries

We consider the problem of contextual bandits and imitation learning, wh...
research
10/29/2018

Heteroscedastic Bandits with Reneging

Although shown to be useful in many areas as models for solving sequenti...
research
09/07/2021

Learning to Bid in Contextual First Price Auctions

In this paper, we investigate the problem about how to bid in repeated c...
research
06/13/2011

Efficient Optimal Learning for Contextual Bandits

We address the problem of learning in an online setting where the learne...
research
09/21/2020

Contextual Bandits for adapting to changing User preferences over time

Contextual bandits provide an effective way to model the dynamic data pr...

Please sign up or login with your details

Forgot password? Click here to reset