Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks

04/26/2017
by   Swapna Buccapatnam, et al.
0

We study the stochastic multi-armed bandit (MAB) problem in the presence of side-observations across actions that occur as a result of an underlying network structure. In our model, a bipartite graph captures the relationship between actions and a common set of unknowns such that choosing an action reveals observations for the unknowns that it is connected to. This models a common scenario in online social networks where users respond to their friends' activity, thus providing side information about each other's preferences. Our contributions are as follows: 1) We derive an asymptotic lower bound (with respect to time) as a function of the bi-partite network structure on the regret of any uniformly good policy that achieves the maximum long-term average reward. 2) We propose two policies - a randomized policy; and a policy based on the well-known upper confidence bound (UCB) policies - both of which explore each action at a rate that is a function of its network position. We show, under mild assumptions, that these policies achieve the asymptotic lower bound on the regret up to a multiplicative factor, independent of the network structure. Finally, we use numerical examples on a real-world social network and a routing example network to demonstrate the benefits obtained by our policies over other existing policies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2011

Regret lower bounds and extended Upper Confidence Bounds policies in stochastic multi-armed bandit problem

This paper is devoted to regret lower bounds in the classical model of s...
research
02/29/2016

Collaborative Learning of Stochastic Bandits over a Social Network

We consider a collaborative online learning paradigm, wherein a group of...
research
10/13/2018

On Greedy and Strategic Evaders in Sequential Interdiction Settings with Incomplete Information

We consider a class of sequential interdiction settings where the interd...
research
11/30/2018

Asymptotically Optimal Multi-Armed Bandit Activation Policies under Side Constraints

This paper introduces the first asymptotically optimal strategy for the ...
research
11/08/2017

Information Directed Sampling for Stochastic Bandits with Graph Feedback

We consider stochastic multi-armed bandit problems with graph feedback, ...
research
05/12/2015

Asymptotic Behavior of Minimal-Exploration Allocation Policies: Almost Sure, Arbitrarily Slow Growing Regret

The purpose of this paper is to provide further understanding into the s...
research
06/06/2020

Contextual Bandits with Side-Observations

We investigate contextual bandits in the presence of side-observations a...

Please sign up or login with your details

Forgot password? Click here to reset