Safely Bridging Offline and Online Reinforcement Learning

10/25/2021
by   Wanqiao Xu, et al.
0

A key challenge to deploying reinforcement learning in practice is exploring safely. We propose a natural safety property – uniformly outperforming a conservative policy (adaptively estimated from all data observed thus far), up to a per-episode exploration budget. We then design an algorithm that uses a UCB reinforcement learning policy for exploration, but overrides it as needed to ensure safety with high probability. We experimentally validate our results on a sepsis treatment task, demonstrating that our algorithm can learn while ensuring good performance compared to the baseline policy for every patient.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2023

Near-optimal Conservative Exploration in Reinforcement Learning under Episode-wise Constraints

This paper investigates conservative exploration in reinforcement learni...
research
10/25/2019

MAMPS: Safe Multi-Agent Reinforcement Learning via Model Predictive Shielding

Reinforcement learning is a promising approach to learning control polic...
research
03/17/2023

Towards Safe Propofol Dosing during General Anesthesia Using Deep Offline Reinforcement Learning

Automated anesthesia promises to enable more precise and personalized an...
research
05/25/2019

Safe Reinforcement Learning via Online Shielding

Reinforcement learning is a promising approach to learning control polic...
research
07/07/2020

Provably Safe PAC-MDP Exploration Using Analogies

A key challenge in applying reinforcement learning to safety-critical do...
research
10/27/2020

Conservative Safety Critics for Exploration

Safe exploration presents a major challenge in reinforcement learning (R...
research
10/05/2022

Towards Safe Mechanical Ventilation Treatment Using Deep Offline Reinforcement Learning

Mechanical ventilation is a key form of life support for patients with p...

Please sign up or login with your details

Forgot password? Click here to reset