Convergence of policy gradient methods for finite-horizon stochastic linear-quadratic control problems

11/01/2022
by   Michael Giegrich, et al.
0

We study the global linear convergence of policy gradient (PG) methods for finite-horizon exploratory linear-quadratic control (LQC) problems. The setting includes stochastic LQC problems with indefinite costs and allows additional entropy regularisers in the objective. We consider a continuous-time Gaussian policy whose mean is linear in the state variable and whose covariance is state-independent. Contrary to discrete-time problems, the cost is noncoercive in the policy and not all descent directions lead to bounded iterates. We propose geometry-aware gradient descents for the mean and covariance of the policy using the Fisher geometry and the Bures-Wasserstein geometry, respectively. The policy iterates are shown to satisfy an a-priori bound, and converge globally to the optimal policy with a linear rate. We further propose a novel PG method with discrete-time policies. The algorithm leverages the continuous-time analysis, and achieves a robust linear convergence across different action frequencies. A numerical experiment confirms the convergence and robustness of the proposed algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/20/2020

Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon

We explore reinforcement learning methods for finding the optimal policy...
research
03/22/2022

Linear convergence of a policy gradient method for finite horizon continuous time stochastic control problems

Despite its popularity in the reinforcement learning community, a provab...
research
03/25/2021

On the Convexity of Discrete Time Covariance Steering in Stochastic Linear Systems with Wasserstein Terminal Cost

In this work, we analyze the properties of the solution to the covarianc...
research
03/29/2023

Policy Gradient Methods for Discrete Time Linear Quadratic Regulator With Random Parameters

This paper studies an infinite horizon optimal control problem for discr...
research
05/19/2020

Robust Policy Iteration for Continuous-time Linear Quadratic Regulation

This paper studies the robustness of policy iteration in the context of ...
research
05/31/2023

Restless Bandits with Average Reward: Breaking the Uniform Global Attractor Assumption

We study the infinite-horizon restless bandit problem with the average r...
research
11/03/2022

Geometry and convergence of natural policy gradient methods

We study the convergence of several natural policy gradient (NPG) method...

Please sign up or login with your details

Forgot password? Click here to reset