The price of unfairness in linear bandits with biased feedback

03/18/2022
by   Solenne Gaucher, et al.
0

Artificial intelligence is increasingly used in a wide range of decision making scenarios with higher and higher stakes. At the same time, recent work has highlighted that these algorithms can be dangerously biased, and that their results often need to be corrected to avoid leading to unfair decisions. In this paper, we study the problem of sequential decision making with biased linear bandit feedback. At each round, a player selects an action described by a covariate and by a sensitive attribute. She receives a reward corresponding to the covariates of the action that she has chosen, but only observe a biased evaluation of this reward, where the bias depends on the sensitive attribute. To tackle this problem, we design a Fair Phased Elimination algorithm. We establish an upper bound on its worst-case regret, showing that it is smaller than Cκ 1/3 * log(T) 1/3 T 2/3 , where C is a numerical constant and κ * an explicit geometrical constant characterizing the difficulty of bias estimation. The worst case regret is higher than the dT 1/2 log(T) regret rate obtained under unbiased feedback. We show that this rate cannot be improved for all instances : we obtain lower bounds on the worst-case regret for some sets of actions showing that this rate is tight up to a sub-logarithmic factor. We also obtain gap-dependent upper bounds on the regret, and establish matching lower bounds for some problem instance. Interestingly, the gap-dependent rates reveal the existence of non-trivial instances where the problem is no more difficult than its unbiased counterpart.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2020

On Worst-case Regret of Linear Thompson Sampling

In this paper, we consider the worst-case regret of Linear Thompson Samp...
research
06/06/2022

Asymptotic Instance-Optimal Algorithms for Interactive Decision Making

Past research on interactive decision making problems (bandits, reinforc...
research
05/24/2016

Refined Lower Bounds for Adversarial Bandits

We provide new lower bounds on the regret that must be suffered by adver...
research
10/25/2022

PopArt: Efficient Sparse Regression and Experimental Design for Optimal Sparse Linear Bandits

In sparse linear bandits, a learning agent sequentially selects an actio...
research
06/14/2020

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs

We develop a new approach to obtaining high probability regret bounds fo...
research
03/12/2022

Instance-Dependent Regret Analysis of Kernelized Bandits

We study the kernelized bandit problem, that involves designing an adapt...
research
06/01/2021

Optimal Stopping with Behaviorally Biased Agents: The Role of Loss Aversion and Changing Reference Points

People are often reluctant to sell a house, or shares of stock, below th...

Please sign up or login with your details

Forgot password? Click here to reset