A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning

04/21/2023
by   Mizhaan Prajit Maniyar, et al.
0

We consider the problem of control in the setting of reinforcement learning (RL), where model information is not available. Policy gradient algorithms are a popular solution approach for this problem and are usually shown to converge to a stationary point of the value function. In this paper, we propose two policy Newton algorithms that incorporate cubic regularization. Both algorithms employ the likelihood ratio method to form estimates of the gradient and Hessian of the value function using sample trajectories. The first algorithm requires an exact solution of the cubic regularized problem in each iteration, while the second algorithm employs an efficient gradient descent-based approximation to the cubic regularized problem. We establish convergence of our proposed algorithms to a second-order stationary point (SOSP) of the value function, which results in the avoidance of traps in the form of saddle points. In particular, the sample complexity of our algorithms to find an ϵ-SOSP is O(ϵ^-3.5), which is an improvement over the state-of-the-art sample complexity of O(ϵ^-4.5).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2022

Stochastic Second-Order Methods Provably Beat SGD For Gradient-Dominated Functions

We study the performance of Stochastic Cubic Regularized Newton (SCRN) o...
research
11/28/2020

Approximate Midpoint Policy Iteration for Linear Quadratic Control

We present a midpoint policy iteration algorithm to solve linear quadrat...
research
10/15/2017

Manifold Regularization for Kernelized LSTD

Policy evaluation or value function or Q-function approximation is a key...
research
05/10/2019

Second Order Value Iteration in Reinforcement Learning

Value iteration is a fixed point iteration technique utilized to obtain ...
research
10/05/2016

ℓ_1 Regularized Gradient Temporal-Difference Learning

In this paper, we study the Temporal Difference (TD) learning with linea...
research
12/02/2020

Sample Complexity of Policy Gradient Finding Second-Order Stationary Points

The goal of policy-based reinforcement learning (RL) is to search the ma...
research
01/08/2020

A Nonparametric Offpolicy Policy Gradient

Reinforcement learning (RL) algorithms still suffer from high sample com...

Please sign up or login with your details

Forgot password? Click here to reset