Fairness and Welfare Quantification for Regret in Multi-Armed Bandits

05/27/2022
by   Siddharth Barman, et al.
0

We extend the notion of regret with a welfarist perspective. Focussing on the classic multi-armed bandit (MAB) framework, the current work quantifies the performance of bandit algorithms by applying a fundamental welfare function, namely the Nash social welfare (NSW) function. This corresponds to equating algorithm's performance to the geometric mean of its expected rewards and leads us to the study of Nash regret, defined as the difference between the – a priori unknown – optimal mean (among the arms) and the algorithm's performance. Since NSW is known to satisfy fairness axioms, our approach complements the utilitarian considerations of average (cumulative) regret, wherein the algorithm is evaluated via the arithmetic mean of its expected rewards. This work develops an algorithm that, given the horizon of play T, achieves a Nash regret of O ( √(k log T/T)), here k denotes the number of arms in the MAB instance. Since, for any algorithm, the Nash regret is at least as much as its average regret (the AM-GM inequality), the known lower bound on average regret holds for Nash regret as well. Therefore, our Nash regret guarantee is essentially tight. In addition, we develop an anytime algorithm with a Nash regret guarantee of O ( √(klog T/T)log T ).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2020

Fair Algorithms for Multi-Agent Multi-Armed Bandits

We propose a multi-agent variant of the classical multi-armed bandit pro...
research
09/23/2022

An Efficient Algorithm for Fair Multi-Agent Multi-Armed Bandit with Low Regret

Recently a multi-agent variant of the classical multi-armed bandit was p...
research
02/24/2020

Optimal and Greedy Algorithms for Multi-Armed Bandits with Many Arms

We characterize Bayesian regret in a stochastic multi-armed bandit probl...
research
05/27/2019

Stochastic Multi-armed Bandits with Arm-specific Fairness Guarantees

We study an interesting variant of the stochastic multi-armed bandit pro...
research
06/04/2021

Fair Exploration via Axiomatic Bargaining

Motivated by the consideration of fairly sharing the cost of exploration...
research
01/12/2021

Dynamic Spectrum Access using Stochastic Multi-User Bandits

A stochastic multi-user multi-armed bandit framework is used to develop ...
research
12/13/2021

Stochastic differential equations for limiting description of UCB rule for Gaussian multi-armed bandits

We consider the upper confidence bound strategy for Gaussian multi-armed...

Please sign up or login with your details

Forgot password? Click here to reset