Quasi-Newton Iteration in Deterministic Policy Gradient

03/25/2022
by   Arash Bahari Kordabad, et al.
0

This paper presents a model-free approximation for the Hessian of the performance of deterministic policies to use in the context of Reinforcement Learning based on Quasi-Newton steps in the policy parameters. We show that the approximate Hessian converges to the exact Hessian at the optimal policy, and allows for a superlinear convergence in the learning, provided that the policy parametrization is rich. The natural policy gradient method can be interpreted as a particular case of the proposed method. We analytically verify the formulation in a simple linear case and compare the convergence of the proposed method with the natural policy gradient in a nonlinear example.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2021

Quasi-Newton policy gradient algorithms

Policy gradient algorithms have been widely applied to reinforcement lea...
research
12/28/2022

On the Convergence of Discounted Policy Gradient Methods

Many popular policy gradient methods for reinforcement learning follow a...
research
12/26/2019

Quasi-Newton Trust Region Policy Optimization

We propose a trust region method for policy optimization that employs Qu...
research
02/03/2023

Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate Policies

Recently, the impressive empirical success of policy gradient (PG) metho...
research
04/06/2021

MPC-based Reinforcement Learning for Economic Problems with Application to Battery Storage

In this paper, we are interested in optimal control problems with purely...
research
09/30/2021

Combining Sobolev Smoothing with Parameterized Shape Optimization

On the one hand Sobolev gradient smoothing can considerably improve the ...
research
11/19/2021

Policy Gradient Approach to Compilation of Variational Quantum Circuits

We propose a method for finding approximate compilations of quantum circ...

Please sign up or login with your details

Forgot password? Click here to reset