Continuous Doubly Constrained Batch Reinforcement Learning

02/18/2021
by   Rasool Fakoor, et al.
0

Reliant on too many experiments to learn good actions, current Reinforcement Learning (RL) algorithms have limited applicability in real-world settings, which can be too expensive to allow exploration. We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment. The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data. This leads to particularly severe extrapolation when our candidate policies diverge from one that generated the data. We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates. Over a comprehensive set of 32 continuous-action batch RL benchmarks, our approach compares favorably to state-of-the-art methods, regardless of how the offline data were collected.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/16/2020

Provably Good Batch Reinforcement Learning Without Great Exploration

Batch reinforcement learning (RL) is important to apply RL algorithms to...
research
02/19/2020

Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning

Off-policy reinforcement learning algorithms promise to be applicable in...
research
05/02/2023

Leveraging Factored Action Spaces for Efficient Offline Reinforcement Learning in Healthcare

Many reinforcement learning (RL) applications have combinatorial action ...
research
11/02/2022

Knowing the Past to Predict the Future: Reinforcement Virtual Learning

Reinforcement Learning (RL)-based control system has received considerab...
research
05/31/2022

Graph Backup: Data Efficient Backup Exploiting Markovian Transitions

The successes of deep Reinforcement Learning (RL) are limited to setting...
research
06/30/2019

Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog

Most deep reinforcement learning (RL) systems are not able to learn effe...
research
10/13/2021

Offline Reinforcement Learning for Autonomous Driving with Safety and Exploration Enhancement

Reinforcement learning (RL) is a powerful data-driven control method tha...

Please sign up or login with your details

Forgot password? Click here to reset