Exploring More When It Needs in Deep Reinforcement Learning

09/28/2021
by   Youtian Guo, et al.
0

We propose a exploration mechanism of policy in Deep Reinforcement Learning, which is exploring more when agent needs, called Add Noise to Noise (AN2N). The core idea is: when the Deep Reinforcement Learning agent is in a state of poor performance in history, it needs to explore more. So we use cumulative rewards to evaluate which past states the agents have not performed well, and use cosine distance to measure whether the current state needs to be explored more. This method shows that the exploration mechanism of the agent's policy is conducive to efficient exploration. We combining the proposed exploration mechanism AN2N with Deep Deterministic Policy Gradient (DDPG), Soft Actor-Critic (SAC) algorithms, and apply it to the field of continuous control tasks, such as halfCheetah, Hopper, and Swimmer, achieving considerable improvement in performance and convergence speed.

READ FULL TEXT
research
07/22/2019

Agent Modeling as Auxiliary Task for Deep Reinforcement Learning

In this paper we explore how actor-critic methods in deep reinforcement ...
research
02/25/2023

Hierarchical Needs-driven Agent Learning Systems: From Deep Reinforcement Learning To Diverse Strategies

The needs describe the necessities for a system to survive and evolve, w...
research
06/30/2017

Noisy Networks for Exploration

We introduce NoisyNet, a deep reinforcement learning agent with parametr...
research
11/11/2019

DRiLLS: Deep Reinforcement Learning for Logic Synthesis

Logic synthesis requires extensive tuning of the synthesis optimization ...
research
08/25/2017

Reinforcement Mechanism Design for e-commerce

We study the problem of allocating impressions to sellers in e-commerce ...
research
02/13/2018

Diversity-Driven Exploration Strategy for Deep Reinforcement Learning

Efficient exploration remains a challenging research problem in reinforc...
research
06/13/2021

Deep Reinforcement Learning based Group Recommender System

Group recommender systems are widely used in current web applications. I...

Please sign up or login with your details

Forgot password? Click here to reset