Perturbation-based exploration methods in deep reinforcement learning

11/10/2020
by   Sneha Aenugu, et al.
0

Recent research on structured exploration placed emphasis on identifying novel states in the state space and incentivizing the agent to revisit them through intrinsic reward bonuses. In this study, we question whether the performance boost demonstrated through these methods is indeed due to the discovery of structure in exploratory schedule of the agent or is the benefit largely attributed to the perturbations in the policy and reward space manifested in pursuit of structured exploration. In this study we investigate the effect of perturbations in policy and reward spaces on the exploratory behavior of the agent. We proceed to show that simple acts of perturbing the policy just before the softmax layer and introduction of sporadic reward bonuses into the domain can greatly enhance exploration in several domains of the arcade learning environment. In light of these findings, we recommend benchmarking any enhancements to structured exploration research against the backdrop of noisy exploration.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/02/2022

Exploration in Deep Reinforcement Learning: A Survey

This paper reviews exploration techniques in deep reinforcement learning...
research
01/26/2023

Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning

We present AIRS: Automatic Intrinsic Reward Shaping that intelligently a...
research
12/21/2018

NADPEx: An on-policy temporally consistent exploration method for deep reinforcement learning

Reinforcement learning agents need exploratory behaviors to escape from ...
research
02/13/2018

Diversity-Driven Exploration Strategy for Deep Reinforcement Learning

Efficient exploration remains a challenging research problem in reinforc...
research
06/30/2017

Noisy Networks for Exploration

We introduce NoisyNet, a deep reinforcement learning agent with parametr...
research
06/29/2020

End-Effect Exploration Drive for Effective Motor Learning

End-effect drives are proposed here as an effective way to implement goa...
research
07/29/2018

Sidekick Policy Learning for Active Visual Exploration

We consider an active visual exploration scenario, where an agent must i...

Please sign up or login with your details

Forgot password? Click here to reset