Global Convergence of Receding-Horizon Policy Search in Learning Estimator Designs

09/09/2023
by   Xiangyuan Zhang, et al.
0

We introduce the receding-horizon policy gradient (RHPG) algorithm, the first PG algorithm with provable global convergence in learning the optimal linear estimator designs, i.e., the Kalman filter (KF). Notably, the RHPG algorithm does not require any prior knowledge of the system for initialization and does not require the target system to be open-loop stable. The key of RHPG is that we integrate vanilla PG (or any other policy search directions) into a dynamic programming outer loop, which iteratively decomposes the infinite-horizon KF problem that is constrained and non-convex in the policy parameter into a sequence of static estimation problems that are unconstrained and strongly-convex, thus enabling global convergence. We further provide fine-grained analyses of the optimization landscape under RHPG and detail the convergence and sample complexity guarantees of the algorithm. This work serves as an initial attempt to develop reinforcement learning algorithms specifically for control applications with performance guarantees by utilizing classic control theory in both algorithmic design and theoretical analyses. Lastly, we validate our theories by deploying the RHPG algorithm to learn the Kalman filter design of a large-scale convection-diffusion model. We open-source the code repository at <https://github.com/xiangyuan-zhang/LearningKF>.

READ FULL TEXT
research
01/30/2023

Learning the Kalman Filter with Fine-Grained Sample Complexity

We develop the first end-to-end sample complexity of model-free policy g...
research
02/25/2023

Revisiting LQR Control from the Perspective of Receding-Horizon Policy Gradient

We revisit in this paper the discrete-time linear quadratic regulator (L...
research
01/04/2021

Derivative-Free Policy Optimization for Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity

Direct policy search serves as one of the workhorses in modern reinforce...
research
07/23/2021

A general sample complexity analysis of vanilla policy gradient

The policy gradient (PG) is one of the most popular methods for solving ...
research
10/19/2021

On the Global Convergence of Momentum-based Policy Gradient

Policy gradient (PG) methods are popular and efficient for large-scale r...
research
06/21/2022

Neural Moving Horizon Estimation for Robust Flight Control

Estimating and reacting to external disturbances is crucial for robust f...
research
08/06/2021

Differentiable Moving Horizon Estimation for Robust Flight Control

Estimating and reacting to external disturbances is of fundamental impor...

Please sign up or login with your details

Forgot password? Click here to reset