Defining Admissible Rewards for High Confidence Policy Evaluation

05/30/2019
by   Niranjani Prasad, et al.
1

A key impediment to reinforcement learning (RL) in real applications with limited, batch data is defining a reward function that reflects what we implicitly know about reasonable behaviour for a task and allows for robust off-policy evaluation. In this work, we develop a method to identify an admissible set of reward functions for policies that (a) do not diverge too far from past behaviour, and (b) can be evaluated with high confidence, given only a collection of past trajectories. Together, these ensure that we propose policies that we trust to be implemented in high-risk settings. We demonstrate our approach to reward design on synthetic domains as well as in a critical care context, for a reward that consolidates clinical objectives to learn a policy for weaning patients from mechanical ventilation.

READ FULL TEXT
research
10/06/2020

Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning

Reinforcement learning (RL) methods usually treat reward functions as bl...
research
12/17/2021

Learning Reward Machines: A Study in Partially Observable Reinforcement Learning

Reinforcement learning (RL) is a central problem in artificial intellige...
research
12/17/2021

Contrastive Explanations for Comparing Preferences of Reinforcement Learning Agents

In complex tasks where the reward function is not straightforward and co...
research
07/03/2017

Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning

In the field of reinforcement learning there has been recent progress to...
research
08/14/2018

An Optimal Policy for Patient Laboratory Tests in Intensive Care Units

Laboratory testing is an integral tool in the management of patient care...
research
10/04/2019

Manufacturing Dispatching using Reinforcement and Transfer Learning

Efficient dispatching rule in manufacturing industry is key to ensure pr...
research
03/23/2011

Doubly Robust Policy Evaluation and Learning

We study decision making in environments where the reward is only partia...

Please sign up or login with your details

Forgot password? Click here to reset