Stochastic Dimension-reduced Second-order Methods for Policy Optimization

01/28/2023
by   Jinsong Liu, et al.
0

In this paper, we propose several new stochastic second-order algorithms for policy optimization that only require gradient and Hessian-vector product in each iteration, making them computationally efficient and comparable to policy gradient methods. Specifically, we propose a dimension-reduced second-order method (DR-SOPO) which repeatedly solves a projected two-dimensional trust region subproblem. We show that DR-SOPO obtains an 𝒪(ϵ^-3.5) complexity for reaching approximate first-order stationary condition and certain subspace second-order stationary condition. In addition, we present an enhanced algorithm (DVR-SOPO) which further improves the complexity to 𝒪(ϵ^-3) based on the variance reduction technique. Preliminary experiments show that our proposed algorithms perform favorably compared with stochastic and variance-reduced policy gradient methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/17/2022

Adaptive Momentum-Based Policy Gradient with Second-Order Information

The variance reduced gradient estimators for policy gradient methods has...
research
07/30/2022

DRSOM: A Dimension Reduced Second-Order Method and Preliminary Analyses

We introduce a Dimension-Reduced Second-Order Method (DRSOM) for convex ...
research
02/03/2023

Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate Policies

Recently, the impressive empirical success of policy gradient (PG) metho...
research
06/23/2021

Bregman Gradient Policy Optimization

In this paper, we design a novel Bregman gradient policy optimization fr...
research
12/02/2020

Sample Complexity of Policy Gradient Finding Second-Order Stationary Points

The goal of policy-based reinforcement learning (RL) is to search the ma...
research
05/09/2018

Policy Optimization with Second-Order Advantage Information

Policy optimization on high-dimensional continuous control tasks exhibit...
research
09/19/2023

Oracle Complexity Reduction for Model-free LQR: A Stochastic Variance-Reduced Policy Gradient Approach

We investigate the problem of learning an ϵ-approximate solution for the...

Please sign up or login with your details

Forgot password? Click here to reset