Consistent Dropout for Policy Gradient Reinforcement Learning

02/23/2022
by   Matthew Hausknecht, et al.
0

Dropout has long been a staple of supervised learning, but is rarely used in reinforcement learning. We analyze why naive application of dropout is problematic for policy-gradient learning algorithms and introduce consistent dropout, a simple technique to address this instability. We demonstrate consistent dropout enables stable training with A2C and PPO in both continuous and discrete action environments across a wide range of dropout probabilities. Finally, we show that consistent dropout enables the online training of complex architectures such as GPT without needing to disable the model's native dropout.

READ FULL TEXT
research
12/21/2018

NADPEx: An on-policy temporally consistent exploration method for deep reinforcement learning

Reinforcement learning agents need exploratory behaviors to escape from ...
research
02/18/2019

Message-Dropout: An Efficient Training Method for Multi-Agent Deep Reinforcement Learning

In this paper, we propose a new learning technique named message-dropout...
research
12/03/2021

Episodic Policy Gradient Training

We introduce a novel training procedure for policy gradient methods wher...
research
05/22/2017

Concrete Dropout

Dropout is used as a practical tool to obtain uncertainty estimates in l...
research
05/02/2022

Triangular Dropout: Variable Network Width without Retraining

One of the most fundamental design choices in neural networks is layer w...
research
09/17/2021

Dropout's Dream Land: Generalization from Learned Simulators to Reality

A World Model is a generative model used to simulate an environment. Wor...
research
10/05/2021

Dropout Q-Functions for Doubly Efficient Reinforcement Learning

Randomized ensemble double Q-learning (REDQ) has recently achieved state...

Please sign up or login with your details

Forgot password? Click here to reset