Switching Isotropic and Directional Exploration with Parameter Space Noise in Deep Reinforcement Learning

09/18/2018
by   Izumi Karino, et al.
0

This paper proposes an exploration method for deep reinforcement learning based on parameter space noise. Recent studies have experimentally shown that parameter space noise results in better exploration than the commonly used action space noise. Previous methods devised a way to update the diagonal covariance matrix of a noise distribution and did not consider the direction of the noise vector and its correlation. In addition, fast updates of the noise distribution are required to facilitate policy learning. We propose a method that deforms the noise distribution according to the accumulated returns and the noises that have led to the returns. Moreover, this method switches isotropic exploration and directional exploration in parameter space with regard to obtained rewards. We validate our exploration strategy in the OpenAI Gym continuous environments and modified environments with sparse rewards. The proposed method achieves results that are competitive with a previous method at baseline tasks. Moreover, our approach exhibits better performance in sparse reward environments by exploration with the switching strategy.

READ FULL TEXT
research
05/23/2019

Combine PPO with NES to Improve Exploration

We introduce two approaches for combining neural evolution strategy (NES...
research
02/22/2022

A Comparative Study of Deep Reinforcement Learning-based Transferable Energy Management Strategies for Hybrid Electric Vehicles

The deep reinforcement learning-based energy management strategies (EMS)...
research
06/06/2017

Parameter Space Noise for Exploration

Deep reinforcement learning (RL) methods generally engage in exploratory...
research
06/25/2020

Noise, overestimation and exploration in Deep Reinforcement Learning

We will discuss some statistical noise related phenomena, that were inve...
research
06/04/2018

Mitigation of Policy Manipulation Attacks on Deep Q-Networks with Parameter-Space Noise

Recent developments have established the vulnerability of deep reinforce...
research
02/18/2019

Optimized data exploration applied to the simulation of a chemical process

In complex simulation environments, certain parameter space regions may ...
research
03/07/2022

Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation

We present Nonparametric Approximation of Inter-Trace returns (NAIT), a ...

Please sign up or login with your details

Forgot password? Click here to reset