DeepAI
Log In Sign Up

An Entropy Regularization Free Mechanism for Policy-based Reinforcement Learning

06/01/2021
by   Changnan Xiao, et al.
0

Policy-based reinforcement learning methods suffer from the policy collapse problem. We find valued-based reinforcement learning methods with ϵ-greedy mechanism are capable of enjoying three characteristics, Closed-form Diversity, Objective-invariant Exploration and Adaptive Trade-off, which help value-based methods avoid the policy collapse problem. However, there does not exist a parallel mechanism for policy-based methods that achieves all three characteristics. In this paper, we propose an entropy regularization free mechanism that is designed for policy-based methods, which achieves Closed-form Diversity, Objective-invariant Exploration and Adaptive Trade-off. Our experiments show that our mechanism is super sample-efficient for policy-based methods and boosts a policy-based baseline to a new State-Of-The-Art on Arcade Learning Environment.

READ FULL TEXT

page 1

page 2

page 3

page 4

05/09/2021

CASA-B: A Unified Framework of Model-Free Reinforcement Learning

Building on the breakthrough of reinforcement learning, this paper intro...
11/18/2018

Policy Optimization with Model-based Explorations

Model-free reinforcement learning methods such as the Proximal Policy Op...
12/13/2018

Revisiting Exploration-Conscious Reinforcement Learning

The objective of Reinforcement Learning is to learn an optimal policy by...
05/16/2022

Enforcing KL Regularization in General Tsallis Entropy Reinforcement Learning via Advantage Learning

Maximum Tsallis entropy (MTE) framework in reinforcement learning has ga...
10/12/2019

Efficient Inference and Exploration for Reinforcement Learning

Despite an ever growing literature on reinforcement learning algorithms ...
11/27/2018

Understanding the impact of entropy on policy optimization

Entropy regularization is commonly used to improve policy optimization i...
07/05/2018

Goal-oriented Trajectories for Efficient Exploration

Exploration is a difficult challenge in reinforcement learning and even ...