Boosting Trust Region Policy Optimization by Normalizing Flows Policy

09/27/2018
by   Yunhao Tang, et al.
5

We propose to improve trust region policy search with normalizing flows policy. We illustrate that when the trust region is constructed by KL divergence constraint, normalizing flows policy can generate samples far from the 'center' of the previous policy iterate, which potentially enables better exploration and helps avoid bad local optima. We show that normalizing flows policy significantly improves upon factorized Gaussian policy baseline, with both TRPO and ACKTR, especially on tasks with complex dynamics such as Humanoid.

READ FULL TEXT
research
07/06/2017

Trust-PCL: An Off-Policy Trust Region Method for Continuous Control

Trust region methods, such as TRPO, are often used to stabilize policy o...
research
01/29/2019

Trust Region-Guided Proximal Policy Optimization

Model-free reinforcement learning relies heavily on a safe yet explorato...
research
02/07/2019

Compatible Natural Gradient Policy Search

Trust-region methods have yielded state-of-the-art results in policy sea...
research
03/19/2019

Truly Proximal Policy Optimization

Proximal policy optimization (PPO) is one of the most successful deep re...
research
11/29/2022

Relative Sparsity for Medical Decision Problems

Existing statistical methods can be used to estimate a policy, or a mapp...
research
02/24/2022

Entropic trust region for densest crystallographic symmetry group packings

Molecular crystal structure prediction (CSP) seeks the most stable perio...
research
12/29/2017

f-Divergence constrained policy improvement

To ensure stability of learning, state-of-the-art generalized policy ite...

Please sign up or login with your details

Forgot password? Click here to reset