Inverse Online Learning: Understanding Non-Stationary and Reactionary Policies

03/14/2022
by   Alex J. Chan, et al.
1

Human decision making is well known to be imperfect and the ability to analyse such processes individually is crucial when attempting to aid or improve a decision-maker's ability to perform a task, e.g. to alert them to potential biases or oversights on their part. To do so, it is necessary to develop interpretable representations of how agents make decisions and how this process changes over time as the agent learns online in reaction to the accrued experience. To then understand the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem. By interpreting actions within a potential outcomes framework, we introduce a meaningful mapping based on agents choosing an action they believe to have the greatest treatment effect. We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them, using a novel architecture built upon an expressive family of deep state-space models. Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2020

Inverse Multiobjective Optimization Through Online Learning

We study the problem of learning the objective functions or constraints ...
research
02/25/2022

Decision Making in Non-Stationary Environments with Policy-Augmented Monte Carlo Tree Search

Decision-making under uncertainty (DMU) is present in many important pro...
research
06/05/2017

A method for the online construction of the set of states of a Markov Decision Process using Answer Set Programming

Non-stationary domains, that change in unpredicted ways, are a challenge...
research
09/29/2021

A Spatial Agent-Based Model for Preemptive Evacuation Decisions During Typhoon

Natural disasters continue to cause tremendous damage to human lives and...
research
01/16/2014

Non-Deterministic Policies in Markovian Decision Processes

Markovian processes have long been used to model stochastic environments...
research
07/13/2021

Inverse Contextual Bandits: Learning How Behavior Evolves over Time

Understanding an agent's priorities by observing their behavior is criti...
research
03/15/2017

Humans of Simulated New York (HOSNY): an exploratory comprehensive model of city life

The model presented in this paper experiments with a comprehensive simul...

Please sign up or login with your details

Forgot password? Click here to reset