Entropy Augmented Reinforcement Learning

08/19/2022
by   Jianfei Ma, et al.
0

Deep reinforcement learning has gained a lot of success with the presence of trust region policy optimization (TRPO) and proximal policy optimization (PPO), for their scalability and efficiency. However, the pessimism of both algorithms, among which it either is constrained in a trust region or strictly excludes all suspicious gradients, has been proven to suppress the exploration and harm the performance of the agent. To address those issues, we propose a shifted Markov decision process (MDP), or rather, with entropy augmentation, to encourage the exploration and reinforce the ability of escaping from suboptimums. Our method is extensible and adapts to either reward shaping or bootstrapping. With convergence analysis given, we find it is crucial to control the temperature coefficient. However, if appropriately tuning it, we can achieve remarkable performance, even on other algorithms, since it is simple yet effective. Our experiments test augmented TRPO and PPO on MuJoCo benchmark tasks, of an indication that the agent is heartened towards higher reward regions, and enjoys a balance between exploration and exploitation. We verify the exploration bonus of our method on two grid world environments.

READ FULL TEXT

page 6

page 17

research
11/11/2020

Proximal Policy Optimization via Enhanced Exploration Efficiency

Proximal policy optimization (PPO) algorithm is a deep reinforcement lea...
research
02/08/2020

Conservative Exploration in Reinforcement Learning

While learning in an unknown Markov Decision Process (MDP), an agent sho...
research
09/12/2022

Deterministic Sequencing of Exploration and Exploitation for Reinforcement Learning

We propose Deterministic Sequencing of Exploration and Exploitation (DSE...
research
12/14/2022

Robust Policy Optimization in Deep Reinforcement Learning

The policy gradient method enjoys the simplicity of the objective where ...
research
09/15/2020

Soft policy optimization using dual-track advantage estimator

In reinforcement learning (RL), we always expect the agent to explore as...
research
11/03/2022

Theta-Resonance: A Single-Step Reinforcement Learning Method for Design Space Exploration

Given an environment (e.g., a simulator) for evaluating samples in a spe...
research
06/30/2017

Noisy Networks for Exploration

We introduce NoisyNet, a deep reinforcement learning agent with parametr...

Please sign up or login with your details

Forgot password? Click here to reset