Quantifying the Burden of Exploration and the Unfairness of Free Riding

10/20/2018
by   Christopher Jung, et al.
0

We consider the multi-armed bandit setting with a twist. Rather than having just one decision maker deciding which arm to pull in each round, we have n different decision makers (agents). In the simple stochastic setting we show that one of the agents (called the free rider), who has access to the history of other agents playing some zero regret algorithm can achieve just O(1) regret, as opposed to the regret lower bound of Ω ( T) when one decision maker is playing in isolation. In the linear contextual setting, we show that if the other agents play a particular, popular zero regret algorithm (UCB), then the free rider can again achieve O(1) regret. In order to prove this result, we give a deterministic lower bound on the number of times each suboptimal arm must be pulled in UCB. In contrast, we show that the free-rider cannot beat the standard single-player regret bounds in certain partial information settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/21/2019

Observe Before Play: Multi-armed Bandit with Pre-observations

We consider the stochastic multi-armed bandit (MAB) problem in a setting...
research
02/14/2020

Coordination without communication: optimal regret in two players multi-armed bandits

We consider two agents playing simultaneously the same stochastic three-...
research
02/24/2020

Optimal and Greedy Algorithms for Multi-Armed Bandits with Many Arms

We characterize Bayesian regret in a stochastic multi-armed bandit probl...
research
10/27/2021

(Almost) Free Incentivized Exploration from Decentralized Learning Agents

Incentivized exploration in multi-armed bandits (MAB) has witnessed incr...
research
01/15/2020

The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits

We consider a decentralized multi-agent Multi Armed Bandit (MAB) setup c...
research
02/29/2016

Collaborative Learning of Stochastic Bandits over a Social Network

We consider a collaborative online learning paradigm, wherein a group of...
research
07/10/2018

Bandits with Side Observations: Bounded vs. Logarithmic Regret

We consider the classical stochastic multi-armed bandit but where, from ...

Please sign up or login with your details

Forgot password? Click here to reset