Inverse Contextual Bandits: Learning How Behavior Evolves over Time

07/13/2021
by   Alihan Hüyük, et al.
3

Understanding an agent's priorities by observing their behavior is critical for transparency and accountability in decision processes, such as in healthcare. While conventional approaches to policy learning almost invariably assume stationarity in behavior, this is hardly true in practice: Medical practice is constantly evolving, and clinical professionals are constantly fine-tuning their priorities. We desire an approach to policy learning that provides (1) interpretable representations of decision-making, accounts for (2) non-stationarity in behavior, as well as operating in an (3) offline manner. First, we model the behavior of learning agents in terms of contextual bandits, and formalize the problem of inverse contextual bandits (ICB). Second, we propose two algorithms to tackle ICB, each making varying degrees of assumptions regarding the agent's learning strategy. Finally, through both real and simulated data for liver transplantations, we illustrate the applicability and explainability of our method, as well as validating its accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/13/2023

Learning Personalized Decision Support Policies

Individual human decision-makers may benefit from different forms of sup...
research
04/14/2020

Sequential Batch Learning in Finite-Action Linear Contextual Bandits

We study the sequential batch learning problem in linear contextual band...
research
11/23/2018

Explicability? Legibility? Predictability? Transparency? Privacy? Security? The Emerging Landscape of Interpretable Agent Behavior

There has been significant interest of late in generating behavior of ag...
research
11/14/2019

Contextual Bandits Evolving Over Finite Time

Contextual bandits have the same exploration-exploitation trade-off as s...
research
10/19/2021

Stateful Offline Contextual Policy Evaluation and Learning

We study off-policy evaluation and learning from sequential data in a st...
research
06/09/2022

Conformal Off-Policy Prediction in Contextual Bandits

Most off-policy evaluation methods for contextual bandits have focused o...
research
03/14/2022

Inverse Online Learning: Understanding Non-Stationary and Reactionary Policies

Human decision making is well known to be imperfect and the ability to a...

Please sign up or login with your details

Forgot password? Click here to reset