Clinician-in-the-Loop Decision Making: Reinforcement Learning with Near-Optimal Set-Valued Policies

07/24/2020
by   Shengpu Tang, et al.
0

Standard reinforcement learning (RL) aims to find an optimal policy that identifies the best action for each state. However, in healthcare settings, many actions may be near-equivalent with respect to the reward (e.g., survival). We consider an alternative objective – learning set-valued policies to capture near-equivalent actions that lead to similar cumulative rewards. We propose a model-free algorithm based on temporal difference learning and a near-greedy heuristic for action selection. We analyze the theoretical properties of the proposed algorithm, providing optimality guarantees and demonstrate our approach on simulated environments and a real clinical task. Empirically, the proposed algorithm exhibits good convergence properties and discovers meaningful near-equivalent actions. Our work provides theoretical, as well as practical, foundations for clinician/human-in-the-loop decision making, in which humans (e.g., clinicians, patients) can incorporate additional knowledge (e.g., side effects, patient preference) when selecting among near-equivalent actions.

READ FULL TEXT

page 7

page 8

research
12/10/2002

Searching for Plannable Domains can Speed up Reinforcement Learning

Reinforcement learning (RL) involves sequential decision making in uncer...
research
01/21/2023

Quasi-optimal Learning with Continuous Treatments

Many real-world applications of reinforcement learning (RL) require maki...
research
12/24/2020

Assured RL: Reinforcement Learning with Almost Sure Constraints

We consider the problem of finding optimal policies for a Markov Decisio...
research
08/17/2021

Revisiting State Augmentation methods for Reinforcement Learning with Stochastic Delays

Several real-world scenarios, such as remote control and sensing, are co...
research
09/27/2022

Reinforcement Learning with Non-Exponential Discounting

Commonly in reinforcement learning (RL), rewards are discounted over tim...
research
08/09/2023

Bayesian Inverse Transition Learning for Offline Settings

Offline Reinforcement learning is commonly used for sequential decision-...
research
11/16/2020

Blind Decision Making: Reinforcement Learning with Delayed Observations

Reinforcement learning typically assumes that the state update from the ...

Please sign up or login with your details

Forgot password? Click here to reset