Derivative-Free Policy Optimization for Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity

01/04/2021
by   Kaiqing Zhang, et al.
0

Direct policy search serves as one of the workhorses in modern reinforcement learning (RL), and its applications in continuous control tasks have recently attracted increasing attention. In this work, we investigate the convergence theory of policy gradient (PG) methods for learning the linear risk-sensitive and robust controller. In particular, we develop PG methods that can be implemented in a derivative-free fashion by sampling system trajectories, and establish both global convergence and sample complexity results in the solutions of two fundamental settings in risk-sensitive and robust control: the finite-horizon linear exponential quadratic Gaussian, and the finite-horizon linear-quadratic disturbance attenuation problems. As a by-product, our results also provide the first sample complexity for the global convergence of PG methods on solving zero-sum linear-quadratic dynamic games, a nonconvex-nonconcave minimax optimization problem that serves as a baseline setting in multi-agent reinforcement learning (MARL) with continuous spaces. One feature of our algorithms is that during the learning phase, a certain level of robustness/risk-sensitivity of the controller is preserved, which we termed as the implicit regularization property, and is an essential requirement in safety-critical control systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/08/2023

Learning Zero-Sum Linear Quadratic Games with Improved Sample Complexity

Zero-sum Linear Quadratic (LQ) games are fundamental in optimal control ...
research
10/21/2019

Policy Optimization for H_2 Linear Control with H_∞ Robustness Guarantee: Implicit Regularization and Global Convergence

Policy optimization (PO) is a key ingredient for reinforcement learning ...
research
10/10/2022

Towards a Theoretical Foundation of Policy Optimization for Learning Control Policies

Gradient-based methods have been widely used for system design and optim...
research
12/19/2019

Distributed Reinforcement Learning for Decentralized Linear Quadratic Control: A Derivative-Free Policy Optimization Approach

This paper considers a distributed reinforcement learning problem for de...
research
05/30/2019

Finite-time Analysis of Approximate Policy Iteration for the Linear Quadratic Regulator

We study the sample complexity of approximate policy iteration (PI) for ...
research
12/26/2019

Convergence and sample complexity of gradient methods for the model-free linear quadratic regulator problem

Model-free reinforcement learning attempts to find an optimal control ac...
research
09/09/2023

Global Convergence of Receding-Horizon Policy Search in Learning Estimator Designs

We introduce the receding-horizon policy gradient (RHPG) algorithm, the ...

Please sign up or login with your details

Forgot password? Click here to reset