Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon

11/20/2020
by   Ben Hambly, et al.
0

We explore reinforcement learning methods for finding the optimal policy in the linear quadratic regulator (LQR) problem. In particular, we consider the convergence of policy gradient methods in the setting of known and unknown parameters. We are able to produce a global linear convergence guarantee for this approach in the setting of finite time horizon and stochastic state dynamics under weak assumptions. The convergence of a projected policy gradient method is also established in order to handle problems with constraints. We illustrate the performance of the algorithm with two examples. The first example is the optimal liquidation of a holding in an asset. We show results for the case where we assume a model for the underlying dynamics and where we apply the method to the data directly. The empirical evidence suggests that the policy gradient method can learn the global optimal solution for a larger class of stochastic systems containing the LQR framework and that it is more robust with respect to model mis-specification when compared to a model-based approach. The second example is an LQR system in a higher dimensional setting with synthetic data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2021

Policy Gradient Methods Find the Nash Equilibrium in N-player General-sum Linear-quadratic Games

We consider a general-sum N-player linear-quadratic game with stochastic...
research
03/29/2023

Policy Gradient Methods for Discrete Time Linear Quadratic Regulator With Random Parameters

This paper studies an infinite horizon optimal control problem for discr...
research
11/01/2022

Convergence of policy gradient methods for finite-horizon stochastic linear-quadratic control problems

We study the global linear convergence of policy gradient (PG) methods f...
research
03/15/2023

Policy Gradient Converges to the Globally Optimal Policy for Nearly Linear-Quadratic Regulators

Nonlinear control systems with partial information to the decision maker...
research
06/19/2019

Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies

Policy gradient (PG) methods are a widely used reinforcement learning me...
research
06/14/2022

How are policy gradient methods affected by the limits of control?

We study stochastic policy gradient methods from the perspective of cont...
research
02/10/2020

Convergence Guarantees of Policy Optimization Methods for Markovian Jump Linear Systems

Recently, policy optimization for control purposes has received renewed ...

Please sign up or login with your details

Forgot password? Click here to reset