Oracle Complexity Reduction for Model-free LQR: A Stochastic Variance-Reduced Policy Gradient Approach

09/19/2023
by   Leonardo F. Toso, et al.
0

We investigate the problem of learning an ϵ-approximate solution for the discrete-time Linear Quadratic Regulator (LQR) problem via a Stochastic Variance-Reduced Policy Gradient (SVRPG) approach. Whilst policy gradient methods have proven to converge linearly to the optimal solution of the model-free LQR problem, the substantial requirement for two-point cost queries in gradient estimations may be intractable, particularly in applications where obtaining cost function evaluations at two distinct control input configurations is exceptionally costly. To this end, we propose an oracle-efficient approach. Our method combines both one-point and two-point estimations in a dual-loop variance-reduced algorithm. It achieves an approximate optimal solution with only O(log(1/ϵ)^β) two-point cost information for β∈ (0,1).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/25/2021

Online Policy Gradient for Model Free Learning of Linear Quadratic Regulators with √(T) Regret

We consider the task of learning to control a linear dynamical system un...
research
11/29/2020

Reinforcement Learning in Linear Quadratic Deep Structured Teams: Global Convergence of Policy Gradient Methods

In this paper, we study the global convergence of model-based and model-...
research
02/25/2023

Revisiting LQR Control from the Perspective of Receding-Horizon Policy Gradient

We revisit in this paper the discrete-time linear quadratic regulator (L...
research
01/15/2018

Global Convergence of Policy Gradient Methods for Linearized Control Problems

Direct policy gradient methods for reinforcement learning and continuous...
research
11/15/2022

An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods

In this paper, we revisit and improve the convergence of policy gradient...
research
02/18/2020

D2C 2.0: Decoupled Data-Based Approach for Learning to Control Stochastic Nonlinear Systems via Model-Free ILQR

In this paper, we propose a structured linear parameterization of a feed...
research
01/28/2023

Stochastic Dimension-reduced Second-order Methods for Policy Optimization

In this paper, we propose several new stochastic second-order algorithms...

Please sign up or login with your details

Forgot password? Click here to reset