Parameter Space Noise for Exploration

by   Matthias Plappert, et al.

Deep reinforcement learning (RL) methods generally engage in exploratory behavior through noise injection in the action space. An alternative is to add noise directly to the agent's parameters, which can lead to more consistent exploration and a richer set of behaviors. Methods such as evolutionary strategies use parameter perturbations, but discard all temporal structure in the process and require significantly more samples. Combining parameter noise with traditional RL methods allows to combine the best of both worlds. We demonstrate that both off- and on-policy methods benefit from this approach through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks. Our results show that RL with parameter noise learns more efficiently than traditional RL with action space noise and evolutionary strategies individually.


page 1

page 2

page 3

page 4


Combine PPO with NES to Improve Exploration

We introduce two approaches for combining neural evolution strategy (NES...

A Comparative Study of Deep Reinforcement Learning-based Transferable Energy Management Strategies for Hybrid Electric Vehicles

The deep reinforcement learning-based energy management strategies (EMS)...

Toward Causal-Aware RL: State-Wise Action-Refined Temporal Difference

Although it is well known that exploration plays a key role in Reinforce...

Shaped Policy Search for Evolutionary Strategies using Waypoints

In this paper, we try to improve exploration in Blackbox methods, partic...

Action Noise in Off-Policy Deep Reinforcement Learning: Impact on Exploration and Performance

Many deep reinforcement learning algorithms rely on simple forms of expl...

Switching Isotropic and Directional Exploration with Parameter Space Noise in Deep Reinforcement Learning

This paper proposes an exploration method for deep reinforcement learnin...

Deep Intrinsically Motivated Exploration in Continuous Control

In continuous control, exploration is often performed through undirected...