Contextual Bandits with Cross-learning

09/25/2018
by   Santiago Balseiro, et al.
0

In the classical contextual bandits problem, in each round t, a learner observes some context c, chooses some action a to perform, and receives some reward r_a,t(c). We consider the variant of this problem where in addition to receiving the reward r_a,t(c), the learner also learns the values of r_a,t(c') for all other contexts c'; i.e., the rewards that would have been achieved by performing that action under different contexts. This variant arises in several strategic settings, such as learning how to bid in non-truthful repeated auctions (in this setting the context is the decision maker's private valuation for each auction). We call this problem the contextual bandits problem with cross-learning. The best algorithms for the classical contextual bandits problem achieve Õ(√(CKT)) regret against all stationary policies, where C is the number of contexts, K the number of actions, and T the number of rounds. We demonstrate algorithms for the contextual bandits problem with cross-learning that remove the dependence on C and achieve regret O(√(KT)) (when contexts are stochastic with known distribution), Õ(K^1/3T^2/3) (when contexts are stochastic with unknown distribution), and Õ(√(KT)) (when contexts are adversarial but rewards are stochastic).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2023

Context-lumpable stochastic bandits

We consider a contextual bandit problem with S contexts and A actions. I...
research
06/08/2022

Learning in Distributed Contextual Linear Bandits Without Sharing the Context

Contextual linear bandits is a rich and theoretically important model th...
research
05/28/2019

Repeated A/B Testing

We study a setting in which a learner faces a sequence of A/B tests and ...
research
03/22/2020

Optimal No-regret Learning in Repeated First-price Auctions

We study online learning in repeated first-price auctions with censored ...
research
04/14/2020

Improved Sleeping Bandits with Stochastic Actions Sets and Adversarial Rewards

In this paper, we consider the problem of sleeping bandits with stochast...
research
05/18/2018

PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits

We address the problem of regret minimization in logistic contextual ban...
research
06/09/2021

Contextual Recommendations and Low-Regret Cutting-Plane Algorithms

We consider the following variant of contextual linear bandits motivated...

Please sign up or login with your details

Forgot password? Click here to reset