Stochastic Multi-Armed Bandits with Control Variates

05/09/2021
by   Arun Verma, et al.
0

This paper studies a new variant of the stochastic multi-armed bandits problem, where the learner has access to auxiliary information about the arms. The auxiliary information is correlated with the arm rewards, which we treat as control variates. In many applications, the arm rewards are a function of some exogenous values, whose mean value is known a priori from historical data and hence can be used as control variates. We use the control variates to obtain mean estimates with smaller variance and tighter confidence bounds. We then develop an algorithm named UCB-CV that uses improved estimates. We characterize the regret bounds in terms of the correlation between the rewards and control variates. The experiments on synthetic data validate the performance guarantees of our proposed algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/05/2014

Generalized Risk-Aversion in Stochastic Multi-Armed Bandits

We consider the problem of minimizing the regret in stochastic multi-arm...
research
12/13/2021

Stochastic differential equations for limiting description of UCB rule for Gaussian multi-armed bandits

We consider the upper confidence bound strategy for Gaussian multi-armed...
research
02/19/2020

Warm Starting Bandits with Side Information from Confounded Data

We study a variant of the multi-armed bandit problem where side informat...
research
03/05/2020

Robustness Guarantees for Mode Estimation with an Application to Bandits

Mode estimation is a classical problem in statistics with a wide range o...
research
02/19/2020

On conditional versus marginal bias in multi-armed bandits

The bias of the sample means of the arms in multi-armed bandits is an im...
research
10/21/2019

Multi-User MABs with User Dependent Rewards for Uncoordinated Spectrum Access

Multi-user multi-armed bandits have emerged as a good model for uncoordi...
research
10/16/2012

Leveraging Side Observations in Stochastic Bandits

This paper considers stochastic bandits with side observations, a model ...

Please sign up or login with your details

Forgot password? Click here to reset