Fair Exploration via Axiomatic Bargaining

06/04/2021
by   Jackie Baek, et al.
0

Motivated by the consideration of fairly sharing the cost of exploration between multiple groups in learning problems, we develop the Nash bargaining solution in the context of multi-armed bandits. Specifically, the 'grouped' bandit associated with any multi-armed bandit problem associates, with each time step, a single group from some finite set of groups. The utility gained by a given group under some learning policy is naturally viewed as the reduction in that group's regret relative to the regret that group would have incurred 'on its own'. We derive policies that yield the Nash bargaining solution relative to the set of incremental utilities possible under any policy. We show that on the one hand, the 'price of fairness' under such policies is limited, while on the other hand, regret optimal policies are arbitrarily unfair under generic conditions. Our theoretical development is complemented by a case study on contextual bandits for warfarin dosing where we are concerned with the cost of exploration across multiple races and age groups.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2019

OSOM: A Simultaneously Optimal Algorithm for Multi-Armed and Linear Contextual Bandits

We consider the stochastic linear (multi-armed) contextual bandit proble...
research
01/30/2023

Evaluating COVID-19 vaccine allocation policies using Bayesian m-top exploration

Individual-based epidemiological models support the study of fine-graine...
research
11/14/2022

Hypothesis Transfer in Bandits by Weighted Models

We consider the problem of contextual multi-armed bandits in the setting...
research
05/27/2022

Fairness and Welfare Quantification for Regret in Multi-Armed Bandits

We extend the notion of regret with a welfarist perspective. Focussing o...
research
06/07/2022

Group Meritocratic Fairness in Linear Contextual Bandits

We study the linear contextual bandit problem where an agent has to sele...
research
11/15/2022

On Penalization in Stochastic Multi-armed Bandits

We study an important variant of the stochastic multi-armed bandit (MAB)...
research
10/29/2021

Variational Bayesian Optimistic Sampling

We consider online sequential decision problems where an agent must bala...

Please sign up or login with your details

Forgot password? Click here to reset