Convergence and sample complexity of gradient methods for the model-free linear quadratic regulator problem

by   Hesameddin Mohammadi, et al.

Model-free reinforcement learning attempts to find an optimal control action for an unknown dynamical system by directly searching over the parameter space of controllers. The convergence behavior and statistical properties of these approaches are often poorly understood because of the nonconvex nature of the underlying optimization problems as well as the lack of exact gradient computation. In this paper, we take a step towards demystifying the performance and efficiency of such methods by focusing on the standard infinite-horizon linear quadratic regulator problem for continuous-time systems with unknown state-space parameters. We establish exponential stability for the ordinary differential equation (ODE) that governs the gradient-flow dynamics over the set of stabilizing feedback gains and show that a similar result holds for the gradient descent method that arises from the forward Euler discretization of the corresponding ODE. We also provide theoretical bounds on the convergence rate and sample complexity of a random search method. Our results demonstrate that the required simulation time for achieving ϵ-accuracy in a model-free setup and the total number of function evaluations both scale as log (1/ϵ).


page 1

page 2

page 3

page 4


Global Convergence Using Policy Gradient Methods for Model-free Markovian Jump Linear Quadratic Control

Owing to the growth of interest in Reinforcement Learning in the last fe...

Simple random search provides a competitive approach to reinforcement learning

A common belief in model-free reinforcement learning is that methods bas...

Sample-efficient Model-based Reinforcement Learning for Quantum Control

We propose a model-based reinforcement learning (RL) approach for noisy ...

Accelerated Optimization Landscape of Linear-Quadratic Regulator

Linear-quadratic regulator (LQR) is a landmark problem in the field of o...

Policy Gradient Methods for Discrete Time Linear Quadratic Regulator With Random Parameters

This paper studies an infinite horizon optimal control problem for discr...

Reinforcement learning for linear-convex models with jumps via stability analysis of feedback controls

We study finite-time horizon continuous-time linear-convex reinforcement...

Please sign up or login with your details

Forgot password? Click here to reset