Towards a Theoretical Foundation of Policy Optimization for Learning Control Policies

10/10/2022
by   Bin Hu, et al.
0

Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning. This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synthesis, popularized by successes of reinforcement learning. We take an interdisciplinary perspective in our exposition that connects control theory, reinforcement learning, and large-scale optimization. We review a number of recently-developed theoretical results on the optimization landscape, global convergence, and sample complexity of gradient-based methods for various continuous control problems such as the linear quadratic regulator (LQR), ℋ_∞ control, risk-sensitive control, linear quadratic Gaussian (LQG) control, and output feedback synthesis. In conjunction with these optimization results, we also discuss how direct policy optimization handles stability and robustness concerns in learning-based control, two main desiderata in control engineering. We conclude the survey by pointing out several challenges and opportunities at the intersection of learning and control.

READ FULL TEXT
research
11/24/2020

Policy Optimization for Markovian Jump Linear Quadratic Control: Gradient-Based Methods and Global Convergence

Recently, policy optimization for control purposes has received renewed ...
research
01/04/2021

Derivative-Free Policy Optimization for Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity

Direct policy search serves as one of the workhorses in modern reinforce...
research
06/25/2018

A Tour of Reinforcement Learning: The View from Continuous Control

This manuscript surveys reinforcement learning from the perspective of o...
research
01/31/2023

Toward Efficient Gradient-Based Value Estimation

Gradient-based methods for value estimation in reinforcement learning ha...
research
09/12/2022

On the Optimization Landscape of Dynamic Output Feedback: A Case Study for Linear Quadratic Regulator

The convergence of policy gradient algorithms in reinforcement learning ...
research
07/22/2021

Accelerating Quadratic Optimization with Reinforcement Learning

First-order methods for quadratic optimization such as OSQP are widely u...

Please sign up or login with your details

Forgot password? Click here to reset