Bandit Social Learning: Exploration under Myopic Behavior

02/15/2023
by   Kiarash Banihashem, et al.
0

We study social learning dynamics where the agents collectively follow a simple multi-armed bandit protocol. Agents arrive sequentially, choose arms and receive associated rewards. Each agent observes the full history (arms and rewards) of the previous agents, and there are no private signals. While collectively the agents face exploration-exploitation tradeoff, each agent acts myopically, without regards to exploration. Motivating scenarios concern reviews and ratings on online platforms. We allow a wide range of myopic behaviors that are consistent with (parameterized) confidence intervals, including the "unbiased" behavior as well as various behaviorial biases. While extreme versions of these behaviors correspond to well-known bandit algorithms, we prove that more moderate versions lead to stark exploration failures, and consequently to regret rates that are linear in the number of agents. We provide matching upper bounds on regret by analyzing "moderately optimistic" agents. As a special case of independent interest, we obtain a general result on failure of the greedy algorithm in multi-armed bandits. This is the first such result in the literature, to the best of our knowledge

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2020

Regret Bounds and Reinforcement Learning Exploration of EXP-based Algorithms

EXP-based algorithms are often used for exploration in multi-armed bandi...
research
10/11/2019

Old Dog Learns New Tricks: Randomized UCB for Bandit Problems

We propose RandUCB, a bandit strategy that uses theoretically derived co...
research
02/26/2019

Perturbed-History Exploration in Stochastic Multi-Armed Bandits

We propose an online algorithm for cumulative regret minimization in a s...
research
01/01/2022

Modelling Cournot Games as Multi-agent Multi-armed Bandits

We investigate the use of a multi-agent multi-armed bandit (MA-MAB) sett...
research
10/27/2021

(Almost) Free Incentivized Exploration from Decentralized Learning Agents

Incentivized exploration in multi-armed bandits (MAB) has witnessed incr...
research
03/19/2017

Bernoulli Rank-1 Bandits for Click Feedback

The probability that a user will click a search result depends both on i...
research
07/22/2012

Meta-Learning of Exploration/Exploitation Strategies: The Multi-Armed Bandit Case

The exploration/exploitation (E/E) dilemma arises naturally in many subf...

Please sign up or login with your details

Forgot password? Click here to reset