The Intrinsic Robustness of Stochastic Bandits to Strategic Manipulation

06/04/2019
by   Zhe Feng, et al.
0

We study the behavior of stochastic bandits algorithms under strategic behavior conducted by rational actors, i.e., the arms. Each arm is a strategic player who can modify its own reward whenever pulled, subject to a cross-period budget constraint. Each arm is self-interested and seeks to maximize its own expected number of times of being pulled over a decision horizon. Strategic manipulations naturally arise in various economic applications, e.g., recommendation systems such as Yelp and Amazon. We analyze the robustness of three popular bandit algorithms: UCB, ε-Greedy, and Thompson Sampling. We prove that all three algorithms achieve a regret upper bound O({ B, T}) under any (possibly adaptive) strategy of the strategic arms, where B is the total budget across arms. Moreover, we prove that our regret upper bound is tight. Our results illustrate the intrinsic robustness of bandits algorithms against strategic manipulation so long as B=o(T). This is in sharp contrast to the more pessimistic model of adversarial attacks where an attack budget of O( T) can trick UCB and ε-Greedy to pull the optimal arm only o(T) number of times. Our results hold for both bounded and unbounded rewards.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/25/2021

Combinatorial Bandits under Strategic Manipulations

We study the problem of combinatorial multi-armed bandits (CMAB) under s...
research
02/19/2020

Action-Manipulation Attacks Against Stochastic Bandits: Attacks and Defense

Due to the broad range of applications of stochastic multi-armed bandit ...
research
02/24/2020

Optimal and Greedy Algorithms for Multi-Armed Bandits with Many Arms

We characterize Bayesian regret in a stochastic multi-armed bandit probl...
research
05/10/2022

Risk Aversion In Learning Algorithms and an Application To Recommendation Systems

Consider a bandit learning environment. We demonstrate that popular lear...
research
03/23/2021

Improved Analysis of Robustness of the Tsallis-INF Algorithm to Adversarial Corruptions in Stochastic Multiarmed Bandits

We derive improved regret bounds for the Tsallis-INF algorithm of Zimmer...
research
05/29/2023

Robust Lipschitz Bandits to Adversarial Corruptions

Lipschitz bandit is a variant of stochastic bandits that deals with a co...
research
03/05/2020

Robustness Guarantees for Mode Estimation with an Application to Bandits

Mode estimation is a classical problem in statistics with a wide range o...

Please sign up or login with your details

Forgot password? Click here to reset