Quasi-Newton policy gradient algorithms

10/05/2021
by   Haoya Li, et al.
3

Policy gradient algorithms have been widely applied to reinforcement learning (RL) problems in recent years. Regularization with various entropy functions is often used to encourage exploration and improve stability. In this paper, we propose a quasi-Newton method for the policy gradient algorithm with entropy regularization. In the case of Shannon entropy, the resulting algorithm reproduces the natural policy gradient (NPG) algorithm. For other entropy functions, this method results in brand new policy gradient algorithms. We provide a simple proof that all these algorithms enjoy the Newton-type quadratic convergence near the optimal policy. Using synthetic and industrial-scale examples, we demonstrate that the proposed quasi-Newton method typically converges in single-digit iterations, often orders of magnitude faster than other state-of-the-art algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/25/2022

Quasi-Newton Iteration in Deterministic Policy Gradient

This paper presents a model-free approximation for the Hessian of the pe...
research
07/13/2020

Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization

Natural policy gradient (NPG) methods are among the most widely used pol...
research
02/06/2021

A Hybrid Approach for Reinforcement Learning Using Virtual Policy Gradient for Balancing an Inverted Pendulum

Using the policy gradient algorithm, we train a single-hidden-layer neur...
research
03/05/2018

Learning Sample-Efficient Target Reaching for Mobile Robots

In this paper, we propose a novel architecture and a self-supervised pol...
research
05/24/2022

An interpretation of the final fully connected layer

In recent years neural networks have achieved state-of-the-art accuracy ...
research
12/11/2019

Entropy Regularization with Discounted Future State Distribution in Policy Gradient Methods

The policy gradient theorem is defined based on an objective with respec...
research
04/12/2019

Similarities between policy gradient methods (PGM) in Reinforcement learning (RL) and supervised learning (SL)

Reinforcement learning (RL) is about sequential decision making and is t...

Please sign up or login with your details

Forgot password? Click here to reset