An Entropy Regularization Free Mechanism for Policy-based Reinforcement Learning

06/01/2021
by   Changnan Xiao, et al.
0

Policy-based reinforcement learning methods suffer from the policy collapse problem. We find valued-based reinforcement learning methods with ϵ-greedy mechanism are capable of enjoying three characteristics, Closed-form Diversity, Objective-invariant Exploration and Adaptive Trade-off, which help value-based methods avoid the policy collapse problem. However, there does not exist a parallel mechanism for policy-based methods that achieves all three characteristics. In this paper, we propose an entropy regularization free mechanism that is designed for policy-based methods, which achieves Closed-form Diversity, Objective-invariant Exploration and Adaptive Trade-off. Our experiments show that our mechanism is super sample-efficient for policy-based methods and boosts a policy-based baseline to a new State-Of-The-Art on Arcade Learning Environment.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/09/2021

CASA-B: A Unified Framework of Model-Free Reinforcement Learning

Building on the breakthrough of reinforcement learning, this paper intro...
research
12/13/2018

Revisiting Exploration-Conscious Reinforcement Learning

The objective of Reinforcement Learning is to learn an optimal policy by...
research
07/02/2020

ε-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning

Resolving the exploration-exploitation trade-off remains a fundamental p...
research
02/13/2018

Diversity-Driven Exploration Strategy for Deep Reinforcement Learning

Efficient exploration remains a challenging research problem in reinforc...
research
05/16/2022

Enforcing KL Regularization in General Tsallis Entropy Reinforcement Learning via Advantage Learning

Maximum Tsallis entropy (MTE) framework in reinforcement learning has ga...
research
10/12/2019

Efficient Inference and Exploration for Reinforcement Learning

Despite an ever growing literature on reinforcement learning algorithms ...
research
11/27/2018

Understanding the impact of entropy on policy optimization

Entropy regularization is commonly used to improve policy optimization i...

Please sign up or login with your details

Forgot password? Click here to reset