Fair Bandit Learning with Delayed Impact of Actions

02/24/2020
by   Wei Tang, et al.
5

Algorithmic fairness has been studied mostly in a static setting where the implicit assumptions are that the frequencies of historically made decisions do not impact the problem structure in subsequent future. However, for example, the capability to pay back a loan for people in a certain group might depend on historically how frequently that group has been approved loan applications. If banks keep rejecting loan applications to people in a disadvantaged group, it could create a feedback loop and further damage the chance of getting loans for people in that group. This challenge has been noted in several recent works but is under-explored in a more generic sequential learning setting. In this paper, we formulate this delayed and long-term impact of actions within the context of multi-armed bandits (MAB). We generalize the classical bandit setting to encode the dependency of this action "bias" due to the history of the learning. Our goal is to learn to maximize the collected utilities over time while satisfying fairness constraints imposed over arms' utilities, which again depend on the decision they have received. We propose an algorithm that achieves a regret of Õ(KT^2/3) and show a matching regret lower bound of Ω(KT^2/3), where K is the number of arms and T denotes the learning horizon. Our results complement the bandit literature by adding techniques to deal with actions with long-term impacts and have implications in designing fair algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/14/2020

Thompson Sampling for Combinatorial Semi-bandits with Sleeping Arms and Long-Term Fairness Constraints

We study the combinatorial sleeping multi-armed semi-bandit problem with...
research
06/26/2020

On Regret with Multiple Best Arms

We study regret minimization problem with the existence of multiple best...
research
10/18/2018

Exploiting Correlation in Finite-Armed Structured Bandits

We consider a correlated multi-armed bandit problem in which rewards of ...
research
12/06/2021

Nonstochastic Bandits with Composite Anonymous Feedback

We investigate a nonstochastic bandit setting in which the loss of an ac...
research
05/30/2019

Equipping Experts/Bandits with Long-term Memory

We propose the first reduction-based approach to obtaining long-term mem...
research
06/02/2021

Addressing the Long-term Impact of ML Decisions via Policy Regret

Machine Learning (ML) increasingly informs the allocation of opportuniti...
research
06/23/2023

Trading-off price for data quality to achieve fair online allocation

We consider the problem of online allocation subject to a long-term fair...

Please sign up or login with your details

Forgot password? Click here to reset