How Many Random Seeds? Statistical Power Analysis in Deep Reinforcement Learning Experiments

06/21/2018
by   Cédric Colas, et al.
4

Consistently checking the statistical significance of experimental results is one of the mandatory methodological steps to address the so-called "reproducibility crisis" in deep reinforcement learning. In this tutorial paper, we explain how to determine the number of random seeds one should use to provide a statistically significant comparison of the performance of two algorithms. We also discuss the influence of deviations from the assumptions usually made by statistical tests, we provide guidelines to counter their negative effects and some code to perform the tests.

READ FULL TEXT
research
04/15/2019

A Hitchhiker's Guide to Statistical Comparisons of Reinforcement Learning Algorithms

Consistently checking the statistical significance of experimental resul...
research
06/19/2023

AdaStop: sequential testing for efficient and reliable comparisons of Deep RL Agents

The reproducibility of many experimental results in Deep Reinforcement L...
research
06/25/2020

Noise, overestimation and exploration in Deep Reinforcement Learning

We will discuss some statistical noise related phenomena, that were inve...
research
01/22/2019

Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target

Multi-step methods such as Retrace(λ) and n-step Q-learning have become ...
research
06/20/2020

Improving the replicability of results from a single psychological experiment

We identify two aspects of selective inference as major obstacles for re...
research
09/03/2022

Model-Free Deep Reinforcement Learning in Software-Defined Networks

This paper compares two deep reinforcement learning approaches for cyber...

Please sign up or login with your details

Forgot password? Click here to reset