Off-Policy Deep Reinforcement Learning without Exploration

12/07/2018
by   Scott Fujimoto, et al.
0

Reinforcement learning traditionally considers the task of balancing exploration and exploitation. This work examines batch reinforcement learning--the task of maximally exploiting a given batch of off-policy data, without further data collection. We demonstrate that due to errors introduced by extrapolation, standard off-policy deep reinforcement learning algorithms, such as DQN and DDPG, are only capable of learning with data correlated to their current policy, making them ineffective for most off-policy applications. We introduce a novel class of off-policy algorithms, batch-constrained reinforcement learning, which restricts the action space to force the agent towards behaving on-policy with respect to a subset of the given data. We extend this notion to deep reinforcement learning, and to the best of our knowledge, present the first continuous control deep reinforcement learning algorithm which can learn effectively from uncorrelated off-policy data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2019

Benchmarking Batch Deep Reinforcement Learning Algorithms

Widely-used deep reinforcement learning algorithms have been shown to fa...
research
12/02/2022

CT-DQN: Control-Tutored Deep Reinforcement Learning

One of the major challenges in Deep Reinforcement Learning for control i...
research
02/10/2021

Policy Augmentation: An Exploration Strategy for Faster Convergence of Deep Reinforcement Learning Algorithms

Despite advancements in deep reinforcement learning algorithms, developi...
research
05/26/2022

TempoRL: Temporal Priors for Exploration in Off-Policy Reinforcement Learning

Efficient exploration is a crucial challenge in deep reinforcement learn...
research
02/14/2018

GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms

In continuous action domains, standard deep reinforcement learning algor...
research
10/05/2020

Learning the aerodynamic design of supercritical airfoils through deep reinforcement learning

The aerodynamic design of modern civil aircraft requires a true sense of...
research
12/21/2018

NADPEx: An on-policy temporally consistent exploration method for deep reinforcement learning

Reinforcement learning agents need exploratory behaviors to escape from ...

Please sign up or login with your details

Forgot password? Click here to reset