Action Noise in Off-Policy Deep Reinforcement Learning: Impact on Exploration and Performance

06/08/2022
βˆ™
by   Jakob Hollenstein, et al.
βˆ™
0
βˆ™

Many deep reinforcement learning algorithms rely on simple forms of exploration, such as the additive action-noise often used in continuous control domains. Typically, the scaling factor of this action noise is chosen as a hyper-parameter and kept constant during training. In this paper, we analyze how the learned policy is impacted by the noise type, scale, and reducing of the scaling factor over time. We consider the two most prominent types of action-noise: Gaussian and Ornstein-Uhlenbeck noise, and perform a vast experimental campaign by systematically varying the noise type and scale parameter, and by measuring variables of interest like the expected return of the policy and the state space coverage during exploration. For the latter, we propose a novel state-space coverage measure X_𝒰rel that is more robust to boundary artifacts than previously proposed measures. Larger noise scales generally increase state space coverage. However, we found that increasing the space coverage using a larger noise scale is often not beneficial. On the contrary, reducing the noise-scale over the training process reduces the variance and generally improves the learning performance. We conclude that the best noise-type and scale are environment dependent, and based on our observations, derive heuristic rules for guiding the choice of the action noise as a starting point for further optimization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
βˆ™ 06/17/2019

Learning-Driven Exploration for Reinforcement Learning

Deep reinforcement learning algorithms have been shown to learn complex ...
research
βˆ™ 06/06/2017

Parameter Space Noise for Exploration

Deep reinforcement learning (RL) methods generally engage in exploratory...
research
βˆ™ 06/25/2020

Noise, overestimation and exploration in Deep Reinforcement Learning

We will discuss some statistical noise related phenomena, that were inve...
research
βˆ™ 12/11/2019

Marginalized State Distribution Entropy Regularization in Policy Optimization

Entropy regularization is used to get improved optimization performance ...
research
βˆ™ 11/27/2018

Prioritizing Starting States for Reinforcement Learning

Online, off-policy reinforcement learning algorithms are able to use an ...
research
βˆ™ 12/01/2020

Assessing and Accelerating Coverage in Deep Reinforcement Learning

Current deep reinforcement learning (DRL) algorithms utilize randomness ...
research
βˆ™ 03/31/2020

Exploration in Action Space

Parameter space exploration methods with black-box optimization have rec...

Please sign up or login with your details

Forgot password? Click here to reset