Policy Search by Target Distribution Learning for Continuous Control

05/27/2019
by   Chuheng Zhang, et al.
0

We observe that several existing policy gradient methods (such as vanilla policy gradient, PPO, A2C) may suffer from overly large gradients when the current policy is close to deterministic (even in some very simple environments), leading to an unstable training process. To address this issue, we propose a new method, called target distribution learning (TDL), for policy improvement in reinforcement learning. TDL alternates between proposing a target distribution and training the policy network to approach the target distribution. TDL is more effective in constraining the KL divergence between updated policies, and hence leads to more stable policy improvements over iterations. Our experiments show that TDL algorithms perform comparably to (or better than) state-of-the-art algorithms for most continuous control tasks in the MuJoCo environment while being more stable in training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2018

Variational Inference for Policy Gradient

Inspired by the seminal work on Stein Variational Inference and Stein Va...
research
05/06/2020

Robotic Arm Control and Task Training through Deep Reinforcement Learning

This paper proposes a detailed and extensive comparison of the Trust Reg...
research
01/27/2021

OffCon^3: What is state of the art anyway?

Two popular approaches to model-free continuous control tasks are SAC an...
research
12/03/2022

Policy Learning for Active Target Tracking over Continuous SE(3) Trajectories

This paper proposes a novel model-based policy gradient algorithm for tr...
research
05/27/2018

Contextual Policy Optimisation

Policy gradient methods have been successfully applied to a variety of r...
research
04/25/2018

Multiagent Soft Q-Learning

Policy gradient methods are often applied to reinforcement learning in c...
research
08/05/2020

ClipUp: A Simple and Powerful Optimizer for Distribution-based Policy Evolution

Distribution-based search algorithms are an effective approach for evolu...

Please sign up or login with your details

Forgot password? Click here to reset