Convergence and sample complexity of gradient methods for the model-free linear quadratic regulator problem

by   Hesameddin Mohammadi, et al.

Model-free reinforcement learning attempts to find an optimal control action for an unknown dynamical system by directly searching over the parameter space of controllers. The convergence behavior and statistical properties of these approaches are often poorly understood because of the nonconvex nature of the underlying optimization problems as well as the lack of exact gradient computation. In this paper, we take a step towards demystifying the performance and efficiency of such methods by focusing on the standard infinite-horizon linear quadratic regulator problem for continuous-time systems with unknown state-space parameters. We establish exponential stability for the ordinary differential equation (ODE) that governs the gradient-flow dynamics over the set of stabilizing feedback gains and show that a similar result holds for the gradient descent method that arises from the forward Euler discretization of the corresponding ODE. We also provide theoretical bounds on the convergence rate and sample complexity of a random search method. Our results demonstrate that the required simulation time for achieving ϵ-accuracy in a model-free setup and the total number of function evaluations both scale as log (1/ϵ).


page 1

page 2

page 3

page 4


Global Convergence Using Policy Gradient Methods for Model-free Markovian Jump Linear Quadratic Control

Owing to the growth of interest in Reinforcement Learning in the last fe...

Simple random search provides a competitive approach to reinforcement learning

A common belief in model-free reinforcement learning is that methods bas...

Derivative-Free Policy Optimization for Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity

Direct policy search serves as one of the workhorses in modern reinforce...

Online Policy Gradient for Model Free Learning of Linear Quadratic Regulators with √(T) Regret

We consider the task of learning to control a linear dynamical system un...

D2C 2.0: Decoupled Data-Based Approach for Learning to Control Stochastic Nonlinear Systems via Model-Free ILQR

In this paper, we propose a structured linear parameterization of a feed...

The Effect of Q-function Reuse on the Total Regret of Tabular, Model-Free, Reinforcement Learning

Some reinforcement learning methods suffer from high sample complexity c...

Discrete symbolic optimization and Boltzmann sampling by continuous neural dynamics: Gradient Symbolic Computation

Gradient Symbolic Computation is proposed as a means of solving discrete...