Addressing the Long-term Impact of ML Decisions via Policy Regret

06/02/2021
by   David Lindner, et al.
0

Machine Learning (ML) increasingly informs the allocation of opportunities to individuals and communities in areas such as lending, education, employment, and beyond. Such decisions often impact their subjects' future characteristics and capabilities in an a priori unknown fashion. The decision-maker, therefore, faces exploration-exploitation dilemmas akin to those in multi-armed bandits. Following prior work, we model communities as arms. To capture the long-term effects of ML-based allocation decisions, we study a setting in which the reward from each arm evolves every time the decision-maker pulls that arm. We focus on reward functions that are initially increasing in the number of pulls but may become (and remain) decreasing after a certain point. We argue that an acceptable sequential allocation of opportunities must take an arm's potential for growth into account. We capture these considerations through the notion of policy regret, a much stronger notion than the often-studied external regret, and present an algorithm with provably sub-linear policy regret for sufficiently long time horizons. We empirically compare our algorithm with several baselines and find that it consistently outperforms them, in particular for long time horizons.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/19/2022

Mitigating Disparity while Maximizing Reward: Tight Anytime Guarantee for Improving Bandits

We study the Improving Multi-Armed Bandit (IMAB) problem, where the rewa...
research
01/24/2020

Ballooning Multi-Armed Bandits

In this paper, we introduce Ballooning Multi-Armed Bandits (BL-MAB), a n...
research
06/24/2022

Achievement and Fragility of Long-term Equitability

Equipping current decision-making tools with notions of fairness, equita...
research
11/26/2019

Contextual Combinatorial Conservative Bandits

The problem of multi-armed bandits (MAB) asks to make sequential decisio...
research
02/24/2020

Fair Bandit Learning with Delayed Impact of Actions

Algorithmic fairness has been studied mostly in a static setting where t...
research
09/29/2021

Batched Bandits with Crowd Externalities

In Batched Multi-Armed Bandits (BMAB), the policy is not allowed to be u...

Please sign up or login with your details

Forgot password? Click here to reset