A Gauss-Newton Method for Markov Decision Processes

07/29/2015
by   Thomas Furmston, et al.
0

Approximate Newton methods are a standard optimization tool which aim to maintain the benefits of Newton's method, such as a fast rate of convergence, whilst alleviating its drawbacks, such as computationally expensive calculation or estimation of the inverse Hessian. In this work we investigate approximate Newton methods for policy optimization in Markov Decision Processes (MDPs). We first analyse the structure of the Hessian of the objective function for MDPs. We show that, like the gradient, the Hessian exhibits useful structure in the context of MDPs and we use this analysis to motivate two Gauss-Newton Methods for MDPs. Like the Gauss-Newton method for non-linear least squares, these methods involve approximating the Hessian by ignoring certain terms in the Hessian which are difficult to estimate. The approximate Hessians possess desirable properties, such as negative definiteness, and we demonstrate several important performance guarantees including guaranteed ascent directions, invariance to affine transformation of the parameter space, and convergence guarantees. We finally provide a unifying perspective of key policy search algorithms, demonstrating that our second Gauss-Newton algorithm is closely related to both the EM-algorithm and natural gradient ascent applied to MDPs, but performs significantly better in practice on a range of challenging domains.

READ FULL TEXT
research
01/31/2023

Policy Gradient for s-Rectangular Robust Markov Decision Processes

We present a novel robust policy gradient method (RPG) for s-rectangular...
research
11/03/2022

Geometry and convergence of natural policy gradient methods

We study the convergence of several natural policy gradient (NPG) method...
research
11/21/2019

Scalable methods for computing state similarity in deterministic Markov Decision Processes

We present new algorithms for computing and approximating bisimulation m...
research
02/10/2020

SPAN: A Stochastic Projected Approximate Newton Method

Second-order optimization methods have desirable convergence properties....
research
05/28/2022

Efficient Policy Iteration for Robust Markov Decision Processes via Regularization

Robust Markov decision processes (MDPs) provide a general framework to m...
research
09/30/2018

Newton-MR: Newton's Method Without Smoothness or Convexity

Establishing global convergence of the classical Newton's method has lon...
research
11/17/2022

Learning Mixtures of Markov Chains and MDPs

We present an algorithm for use in learning mixtures of both Markov chai...

Please sign up or login with your details

Forgot password? Click here to reset