Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator

12/22/2017
by   Stephen Tu, et al.
0

Reinforcement learning (RL) has been successfully used to solve many continuous control tasks. Despite its impressive results however, fundamental questions regarding the sample complexity of RL on continuous problems remain open. We study the performance of RL in this setting by considering the behavior of the Least-Squares Temporal Difference (LSTD) estimator on the classic Linear Quadratic Regulator (LQR) problem from optimal control. We give the first finite-time analysis of the number of samples needed to estimate the value function for a fixed static state-feedback policy to within ε-relative error. In the process of deriving our result, we give a general characterization for when the minimum eigenvalue of the empirical covariance matrix formed along the sample path of a fast-mixing stochastic process concentrates above zero, extending a result by Koltchinskii and Mendelson in the independent covariates setting. Finally, we provide experimental evidence indicating that our analysis correctly captures the qualitative behavior of LSTD on several LQR instances.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/09/2018

The Gap Between Model-Based and Model-Free Methods on the Linear Quadratic Regulator: An Asymptotic Viewpoint

The effectiveness of model-based versus model-free methods is a long-sta...
research
04/22/2022

Analysis of Temporal Difference Learning: Linear System Approach

The goal of this technical note is to introduce a new finite-time conver...
research
03/15/2023

On the Benefits of Leveraging Structural Information in Planning Over the Learned Model

Model-based Reinforcement Learning (RL) integrates learning and planning...
research
05/30/2019

Finite-time Analysis of Approximate Policy Iteration for the Linear Quadratic Regulator

We study the sample complexity of approximate policy iteration (PI) for ...
research
12/02/2022

Expected Value of Matrix Quadratic Forms with Wishart distributed Random Matrices

To explore the limits of a stochastic gradient method, it may be useful ...
research
06/16/2020

The Teaching Dimension of Q-learning

In this paper, we initiate the study of sample complexity of teaching, t...
research
05/30/2020

MM-KTD: Multiple Model Kalman Temporal Differences for Reinforcement Learning

There has been an increasing surge of interest on development of advance...

Please sign up or login with your details

Forgot password? Click here to reset