A Regret bound for Non-stationary Multi-Armed Bandits with Fairness Constraints

12/24/2020
by   Shaarad A. R, et al.
0

The multi-armed bandits' framework is the most common platform to study strategies for sequential decision-making problems. Recently, the notion of fairness has attracted a lot of attention in the machine learning community. One can impose the fairness condition that at any given point of time, even during the learning phase, a poorly performing candidate should not be preferred over a better candidate. This fairness constraint is known to be one of the most stringent and has been studied in the stochastic multi-armed bandits' framework in a stationary setting for which regret bounds have been established. The main aim of this paper is to study this problem in a non-stationary setting. We present a new algorithm called Fair Upper Confidence Bound with Exploration Fair-UCBe algorithm for solving a slowly varying stochastic k-armed bandit problem. With this we present two results: (i) Fair-UCBe indeed satisfies the above mentioned fairness condition, and (ii) it achieves a regret bound of O(k^3/2 T^1 - α/2√(log T)), for some suitable α∈ (0, 1), where T is the time horizon. This is the first fair algorithm with a sublinear regret bound applicable to non-stationary bandits to the best of our knowledge. We show that the performance of our algorithm in the non-stationary case approaches that of its stationary counterpart as the variation in the environment tends to zero.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/28/2020

Lifelong Learning in Multi-Armed Bandits

Continuously learning and leveraging the knowledge accumulated from prio...
research
01/02/2023

Local Differential Privacy for Sequential Decision Making in a Changing Environment

We study the problem of preserving privacy while still providing high ut...
research
05/23/2016

Fairness in Learning: Classic and Contextual Bandits

We introduce the study of fairness in multi-armed bandit problems. Our f...
research
05/25/2022

Non-stationary Bandits with Knapsacks

In this paper, we study the problem of bandits with knapsacks (BwK) in a...
research
06/12/2019

Bootstrapping Upper Confidence Bound

Upper Confidence Bound (UCB) method is arguably the most celebrated one ...
research
08/04/2022

Learning the Trading Algorithm in Simulated Markets with Non-stationary Continuum Bandits

The basic Multi-Armed Bandits (MABs) problem is trying to maximize the r...
research
12/07/2022

Stochastic Rising Bandits

This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e...

Please sign up or login with your details

Forgot password? Click here to reset