Contextual Bandits with Side-Observations

06/06/2020
by   Rahul Singh, et al.
2

We investigate contextual bandits in the presence of side-observations across arms in order to design recommendation algorithms for users connected via social networks. Users in social networks respond to their friends' activity, and hence provide information about each other's preferences. In our model, when a learning algorithm recommends an article to a user, not only does it observe his/her response (e.g. an ad click), but also the side-observations, i.e., the response of his neighbors if they were presented with the same article. We model these observation dependencies by a graph G in which nodes correspond to users, and edges correspond to social links. We derive a problem/instance-dependent lower-bound on the regret of any consistent algorithm. We propose an optimization (linear programming) based data-driven learning algorithm that utilizes the structure of G in order to make recommendations to users and show that it is asymptotically optimal, in the sense that its regret matches the lower-bound as the number of rounds T→∞. We show that this asymptotically optimal regret is upper-bounded as O(|χ(G)|log T), where |χ(G)| is the domination number of G. In contrast, a naive application of the existing learning algorithms results in O(Nlog T) regret, where N is the number of users.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/20/2014

Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms

We consider stochastic multi-armed bandits where the expected reward is ...
research
10/16/2012

Leveraging Side Observations in Stochastic Bandits

This paper considers stochastic bandits with side observations, a model ...
research
03/15/2023

Borda Regret Minimization for Generalized Linear Dueling Bandits

Dueling bandits are widely used to model preferential feedback that is p...
research
03/22/2020

Optimal No-regret Learning in Repeated First-price Auctions

We study online learning in repeated first-price auctions with censored ...
research
01/30/2022

Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms

Motivated by online recommendation systems, we propose the problem of fi...
research
04/26/2017

Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks

We study the stochastic multi-armed bandit (MAB) problem in the presence...
research
05/17/2022

Semi-Parametric Contextual Bandits with Graph-Laplacian Regularization

Non-stationarity is ubiquitous in human behavior and addressing it in th...

Please sign up or login with your details

Forgot password? Click here to reset