Remote Contextual Bandits

02/10/2022
by   Francesco Pase, et al.
3

We consider a remote contextual multi-armed bandit (CMAB) problem, in which the decision-maker observes the context and the reward, but must communicate the actions to be taken by the agents over a rate-limited communication channel. This can model, for example, a personalized ad placement application, where the content owner observes the individual visitors to its website, and hence has the context information, but must convey the ads that must be shown to each visitor to a separate entity that manages the marketing content. In this remote CMAB (R-CMAB) problem, the constraint on the communication rate between the decision-maker and the agents imposes a trade-off between the number of bits sent per agent and the acquired average reward. We are particularly interested in characterizing the rate required to achieve sub-linear regret. Consequently, this can be considered as a policy compression problem, where the distortion metric is induced by the learning objectives. We first study the fundamental information theoretic limits of this problem by letting the number of agents go to infinity, and study the regret achieved when Thompson sampling strategy is adopted. In particular, we identify two distinct rate regions resulting in linear and sub-linear regret behavior, respectively. Then, we provide upper bounds on the achievable regret when the decision-maker can reliably transmit the policy without distortion.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/26/2022

Rate-Constrained Remote Contextual Bandits

We consider a rate-constrained contextual multi-armed bandit (RC-CMAB) p...
research
07/07/2019

Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits

We study agents communicating over an underlying network by exchanging m...
research
01/31/2019

Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model

Contextual multi-armed bandit (MAB) algorithms have been shown promising...
research
06/10/2022

Communication Efficient Distributed Learning for Kernelized Contextual Bandits

We tackle the communication efficiency challenge of learning kernelized ...
research
06/28/2021

Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits

We introduce the "inverse bandit" problem of estimating the rewards of a...
research
11/04/2022

Distributed Linear Bandits under Communication Constraints

We consider distributed linear bandits where M agents learn collaborativ...

Please sign up or login with your details

Forgot password? Click here to reset