Benchmarking Batch Deep Reinforcement Learning Algorithms

10/03/2019
by   Scott Fujimoto, et al.
0

Widely-used deep reinforcement learning algorithms have been shown to fail in the batch setting–learning from a fixed data set without interaction with the environment. Following this result, there have been several papers showing reasonable performances under a variety of environments and batch settings. In this paper, we benchmark the performance of recent off-policy and batch reinforcement learning algorithms under unified settings on the Atari domain, with data generated by a single partially-trained behavioral policy. We find that under these conditions, many of these algorithms underperform DQN trained online with the same amount of data, as well as the partially-trained behavioral policy. To introduce a strong baseline, we adapt the Batch-Constrained Q-learning algorithm to a discrete-action setting, and show it outperforms all existing algorithms at this task.

READ FULL TEXT
research
12/07/2018

Off-Policy Deep Reinforcement Learning without Exploration

Reinforcement learning traditionally considers the task of balancing exp...
research
06/03/2020

Causality and Batch Reinforcement Learning: Complementary Approaches To Planning In Unknown Domains

Reinforcement learning algorithms have had tremendous successes in onlin...
research
11/15/2021

Exploiting Action Impact Regularity and Partially Known Models for Offline Reinforcement Learning

Offline reinforcement learning-learning a policy from a batch of data-is...
research
10/17/2017

Map-based Multi-Policy Reinforcement Learning: Enhancing Adaptability of Robots by Deep Reinforcement Learning

In order for robots to perform mission-critical tasks, it is essential t...
research
03/24/2019

Truly Batch Apprenticeship Learning with Deep Successor Features

We introduce a novel apprenticeship learning algorithm to learn an exper...
research
02/19/2020

Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning

Off-policy reinforcement learning algorithms promise to be applicable in...
research
06/02/2021

An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task

Off-policy prediction – learning the value function for one policy from ...

Please sign up or login with your details

Forgot password? Click here to reset