Deep Reinforcement Learning that Matters

09/19/2017
by   Peter Henderson, et al.
1

In recent years, significant progress has been made in solving challenging problems across various domains using deep reinforcement learning (RL). Reproducing existing work and accurately judging the improvements offered by novel methods is vital to sustaining this progress. Unfortunately, reproducing results for state-of-the-art deep RL methods is seldom straightforward. In particular, non-determinism in standard benchmark environments, combined with variance intrinsic to the methods, can make reported results tough to interpret. Without significance metrics and tighter standardization of experimental reporting, it is difficult to determine whether improvements over the prior state-of-the-art are meaningful. In this paper, we investigate challenges posed by reproducibility, proper experimental techniques, and reporting procedures. We illustrate the variability in reported metrics and results when comparing against common baselines and suggest guidelines to make future results in deep RL more reproducible. We aim to spur discussion about how to ensure continued progress in the field by minimizing wasted effort stemming from results that are non-reproducible and easily misinterpreted.

READ FULL TEXT

page 16

page 18

page 19

page 22

page 23

research
09/09/2019

A Survey on Reproducibility by Evaluating Deep Reinforcement Learning Algorithms on Real-World Robots

As reinforcement learning (RL) achieves more success in solving complex ...
research
04/12/2019

Let's Play Again: Variability of Deep Reinforcement Learning Agents in Atari Environments

Reproducibility in reinforcement learning is challenging: uncontrolled s...
research
10/25/2020

How to Make Deep RL Work in Practice

In recent years, challenging control problems became solvable with deep ...
research
11/18/2021

A Survey of Generalisation in Deep Reinforcement Learning

The study of generalisation in deep Reinforcement Learning (RL) aims to ...
research
08/30/2021

Deep Reinforcement Learning at the Edge of the Statistical Precipice

Deep reinforcement learning (RL) algorithms are predominantly evaluated ...
research
05/07/2019

Toybox: A Suite of Environments for Experimental Evaluation of Deep Reinforcement Learning

Evaluation of deep reinforcement learning (RL) is inherently challenging...

Please sign up or login with your details

Forgot password? Click here to reset