Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

12/20/2018
by   Dhruv Malik, et al.
0

We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of these methods when applied to linear-quadratic systems, and study various settings of driving noise and reward feedback. We show that these methods provably converge to within any pre-specified tolerance of the optimal policy with a number of zero-order evaluations that is an explicit polynomial of the error tolerance, dimension, and curvature properties of the problem. Our analysis reveals some interesting differences between the settings of additive driving noise and random initialization, as well as the settings of one-point and two-point reward feedback. Our theory is corroborated by extensive simulations of derivative-free methods on these systems. Along the way, we derive convergence rates for stochastic zero-order optimization algorithms when applied to a certain class of non-convex problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/11/2020

Recent Theoretical Advances in Non-Convex Optimization

Motivated by recent increased interest in optimization algorithms for no...
research
05/31/2019

Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games

We study the global convergence of policy optimization for finding the N...
research
05/30/2023

Plug-in Performative Optimization

When predictions are performative, the choice of which predictor to depl...
research
04/11/2018

Derivative free optimization via repeated classification

We develop an algorithm for minimizing a function using n batched functi...
research
12/19/2019

Distributed Reinforcement Learning for Decentralized Linear Quadratic Control: A Derivative-Free Policy Optimization Approach

This paper considers a distributed reinforcement learning problem for de...
research
12/12/2019

Feedback control theory Model order reduction for stochastic equations

We analyze structure-preserving model order reduction methods for Ornste...
research
07/07/2023

Accelerated Optimization Landscape of Linear-Quadratic Regulator

Linear-quadratic regulator (LQR) is a landmark problem in the field of o...

Please sign up or login with your details

Forgot password? Click here to reset