Compatible Natural Gradient Policy Search

02/07/2019
by   Joni Pajarinen, et al.
0

Trust-region methods have yielded state-of-the-art results in policy search. A common approach is to use KL-divergence to bound the region of trust resulting in a natural gradient policy update. We show that the natural gradient and trust region optimization are equivalent if we use the natural parameterization of a standard exponential policy distribution in combination with compatible value function approximation. Moreover, we show that standard natural gradient updates may reduce the entropy of the policy according to a wrong schedule leading to premature convergence. To control entropy reduction we introduce a new policy search method called compatible policy search (COPOS) which bounds entropy loss. The experimental results show that COPOS yields state-of-the-art results in challenging continuous control tasks and in discrete partially observable tasks.

READ FULL TEXT

page 12

page 13

page 14

research
07/06/2017

Trust-PCL: An Off-Policy Trust Region Method for Continuous Control

Trust region methods, such as TRPO, are often used to stabilize policy o...
research
09/27/2018

Boosting Trust Region Policy Optimization by Normalizing Flows Policy

We propose to improve trust region policy search with normalizing flows ...
research
12/29/2017

f-Divergence constrained policy improvement

To ensure stability of learning, state-of-the-art generalized policy ite...
research
05/24/2022

Regret-Aware Black-Box Optimization with Natural Gradients, Trust-Regions and Entropy Control

Most successful stochastic black-box optimizers, such as CMA-ES, use ran...
research
07/29/2019

Hindsight Trust Region Policy Optimization

As reinforcement learning continues to drive machine intelligence beyond...
research
12/03/2021

An Analytical Update Rule for General Policy Optimization

We present an analytical policy update rule that is independent of param...
research
01/27/2021

OffCon^3: What is state of the art anyway?

Two popular approaches to model-free continuous control tasks are SAC an...

Please sign up or login with your details

Forgot password? Click here to reset