EnTRPO: Trust Region Policy Optimization Method with Entropy Regularization

10/26/2021
by   Sahar Roostaie, et al.
0

Trust Region Policy Optimization (TRPO) is a popular and empirically successful policy search algorithm in reinforcement learning (RL). It iteratively solved the surrogate problem which restricts consecutive policies to be close to each other. TRPO is an on-policy algorithm. On-policy methods bring many benefits, like the ability to gauge each resulting policy. However, they typically discard all the knowledge about the policies which existed before. In this work, we use a replay buffer to borrow from the off-policy learning setting to TRPO. Entropy regularization is usually used to improve policy optimization in reinforcement learning. It is thought to aid exploration and generalization by encouraging more random policy choices. We add an Entropy regularization term to advantage over π, accumulated over time steps, in TRPO. We call this update EnTRPO. Our experiments demonstrate EnTRPO achieves better performance for controlling a Cart-Pole system compared with the original TRPO

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/18/2019

On-Policy Trust Region Policy Optimisation with Replay Buffers

Building upon the recent success of deep reinforcement learning methods,...
research
09/06/2019

Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs

Trust region policy optimization (TRPO) is a popular and empirically suc...
research
11/27/2018

Understanding the impact of entropy on policy optimization

Entropy regularization is commonly used to improve policy optimization i...
research
11/27/2018

Understanding the impact of entropy in policy learning

Entropy regularization is commonly used to improve policy optimization i...
research
10/21/2019

Regularization Matters in Policy Optimization

Deep Reinforcement Learning (Deep RL) has been receiving increasingly mo...
research
12/20/2019

Soft Q-network

When DQN is announced by deepmind in 2013, the whole world is surprised ...
research
05/20/2020

Mirror Descent Policy Optimization

We propose deep Reinforcement Learning (RL) algorithms inspired by mirro...

Please sign up or login with your details

Forgot password? Click here to reset