Fairness and Welfare Quantification for Regret in Multi-Armed Bandits

by   Siddharth Barman, et al.

We extend the notion of regret with a welfarist perspective. Focussing on the classic multi-armed bandit (MAB) framework, the current work quantifies the performance of bandit algorithms by applying a fundamental welfare function, namely the Nash social welfare (NSW) function. This corresponds to equating algorithm's performance to the geometric mean of its expected rewards and leads us to the study of Nash regret, defined as the difference between the – a priori unknown – optimal mean (among the arms) and the algorithm's performance. Since NSW is known to satisfy fairness axioms, our approach complements the utilitarian considerations of average (cumulative) regret, wherein the algorithm is evaluated via the arithmetic mean of its expected rewards. This work develops an algorithm that, given the horizon of play T, achieves a Nash regret of O ( √(k log T/T)), here k denotes the number of arms in the MAB instance. Since, for any algorithm, the Nash regret is at least as much as its average regret (the AM-GM inequality), the known lower bound on average regret holds for Nash regret as well. Therefore, our Nash regret guarantee is essentially tight. In addition, we develop an anytime algorithm with a Nash regret guarantee of O ( √(klog T/T)log T ).


page 1

page 2

page 3

page 4


Fair Algorithms for Multi-Agent Multi-Armed Bandits

We propose a multi-agent variant of the classical multi-armed bandit pro...

An Efficient Algorithm for Fair Multi-Agent Multi-Armed Bandit with Low Regret

Recently a multi-agent variant of the classical multi-armed bandit was p...

Optimal and Greedy Algorithms for Multi-Armed Bandits with Many Arms

We characterize Bayesian regret in a stochastic multi-armed bandit probl...

Stochastic Multi-armed Bandits with Arm-specific Fairness Guarantees

We study an interesting variant of the stochastic multi-armed bandit pro...

Fair Exploration via Axiomatic Bargaining

Motivated by the consideration of fairly sharing the cost of exploration...

Dynamic Spectrum Access using Stochastic Multi-User Bandits

A stochastic multi-user multi-armed bandit framework is used to develop ...

Stochastic differential equations for limiting description of UCB rule for Gaussian multi-armed bandits

We consider the upper confidence bound strategy for Gaussian multi-armed...

Please sign up or login with your details

Forgot password? Click here to reset