Policy Optimization for Markovian Jump Linear Quadratic Control: Gradient-Based Methods and Global Convergence

11/24/2020
by   Joao Paulo Jansch-Porto, et al.
0

Recently, policy optimization for control purposes has received renewed attention due to the increasing interest in reinforcement learning. In this paper, we investigate the global convergence of gradient-based policy optimization methods for quadratic optimal control of discrete-time Markovian jump linear systems (MJLS). First, we study the optimization landscape of direct policy optimization for MJLS, with static state feedback controllers and quadratic performance costs. Despite the non-convexity of the resultant problem, we are still able to identify several useful properties such as coercivity, gradient dominance, and almost smoothness. Based on these properties, we show global convergence of three types of policy optimization methods: the gradient descent method; the Gauss-Newton method; and the natural policy gradient method. We prove that all three methods converge to the optimal state feedback controller for MJLS at a linear rate if initialized at a controller which is mean-square stabilizing. Some numerical examples are presented to support the theory. This work brings new insights for understanding the performance of policy gradient methods on the Markovian jump linear quadratic control problem.

READ FULL TEXT
research
02/10/2020

Convergence Guarantees of Policy Optimization Methods for Markovian Jump Linear Systems

Recently, policy optimization for control purposes has received renewed ...
research
11/30/2021

Global Convergence Using Policy Gradient Methods for Model-free Markovian Jump Linear Quadratic Control

Owing to the growth of interest in Reinforcement Learning in the last fe...
research
10/10/2022

Towards a Theoretical Foundation of Policy Optimization for Learning Control Policies

Gradient-based methods have been widely used for system design and optim...
research
09/12/2022

On the Optimization Landscape of Dynamic Output Feedback: A Case Study for Linear Quadratic Regulator

The convergence of policy gradient algorithms in reinforcement learning ...
research
03/15/2023

Policy Gradient Converges to the Globally Optimal Policy for Nearly Linear-Quadratic Regulators

Nonlinear control systems with partial information to the decision maker...
research
08/14/2023

Dialogue for Prompting: a Policy-Gradient-Based Discrete Prompt Optimization for Few-shot Learning

Prompt-based pre-trained language models (PLMs) paradigm have succeeded ...
research
09/12/2023

Convergence of Gradient-based MAML in LQR

The main objective of this research paper is to investigate the local co...

Please sign up or login with your details

Forgot password? Click here to reset