Stochastic Multi-armed Bandits with Arm-specific Fairness Guarantees

by   Vishakha Patil, et al.

We study an interesting variant of the stochastic multi-armed bandit problem in which each arm is required to be pulled for at least a given fraction of the total available rounds. We investigate the interplay between learning and fairness in terms of a pre-specified vector specifying the fractions of guaranteed pulls. We define a Fairness-aware regret that takes into account the above fairness constraints and extends the conventional notion of regret in a natural way. We show that logarithmic regret can be achieved while (almost) satisfying the fairness requirements. In contrast to the current literature where the fairness notion is instance dependent, we consider that the fairness criterion is exogenously specified as an input to the algorithm. Our regret guarantee is universal i.e. holds for any given fairness vector.


page 1

page 2

page 3

page 4


Achieving Fairness in the Stochastic Multi-armed Bandit Problem

We study an interesting variant of the stochastic multi-armed bandit pro...

A Regret bound for Non-stationary Multi-Armed Bandits with Fairness Constraints

The multi-armed bandits' framework is the most common platform to study ...

Towards Soft Fairness in Restless Multi-Armed Bandits

Restless multi-armed bandits (RMAB) is a framework for allocating limite...

Fairness of Exposure in Stochastic Bandits

Contextual bandit algorithms have become widely used for recommendation ...

Combinatorial Sleeping Bandits with Fairness Constraints

The multi-armed bandit (MAB) model has been widely adopted for studying ...

Fairness and Welfare Quantification for Regret in Multi-Armed Bandits

We extend the notion of regret with a welfarist perspective. Focussing o...

Reinforcement Learning with Fairness Constraints for Resource Distribution in Human-Robot Teams

Much work in robotics and operations research has focused on optimal res...