Continuous Doubly Constrained Batch Reinforcement Learning

02/18/2021
by   Rasool Fakoor, et al.
0

Reliant on too many experiments to learn good actions, current Reinforcement Learning (RL) algorithms have limited applicability in real-world settings, which can be too expensive to allow exploration. We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment. The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data. This leads to particularly severe extrapolation when our candidate policies diverge from one that generated the data. We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates. Over a comprehensive set of 32 continuous-action batch RL benchmarks, our approach compares favorably to state-of-the-art methods, regardless of how the offline data were collected.

READ FULL TEXT

page 1

page 2

page 3

page 4

07/16/2020

Provably Good Batch Reinforcement Learning Without Great Exploration

Batch reinforcement learning (RL) is important to apply RL algorithms to...
02/19/2020

Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning

Off-policy reinforcement learning algorithms promise to be applicable in...
10/13/2021

Offline Reinforcement Learning for Autonomous Driving with Safety and Exploration Enhancement

Reinforcement learning (RL) is a powerful data-driven control method tha...
05/31/2022

Graph Backup: Data Efficient Backup Exploiting Markovian Transitions

The successes of deep Reinforcement Learning (RL) are limited to setting...
06/30/2019

Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog

Most deep reinforcement learning (RL) systems are not able to learn effe...
12/16/2020

Batch-Constrained Distributional Reinforcement Learning for Session-based Recommendation

Most of the existing deep reinforcement learning (RL) approaches for ses...
04/04/2022

Capturing positive utilities during the estimation of recursive logit models: A prism-based approach

Although the recursive logit (RL) model has been recently popular and ha...