Finite-time Analysis of Approximate Policy Iteration for the Linear Quadratic Regulator

05/30/2019
by   Karl Krauth, et al.
0

We study the sample complexity of approximate policy iteration (PI) for the Linear Quadratic Regulator (LQR), building on a recent line of work using LQR as a testbed to understand the limits of reinforcement learning (RL) algorithms on continuous control tasks. Our analysis quantifies the tension between policy improvement and policy evaluation, and suggests that policy evaluation is the dominant factor in terms of sample complexity. Specifically, we show that to obtain a controller that is within ε of the optimal LQR controller, each step of policy evaluation requires at most (n+d)^3/ε^2 samples, where n is the dimension of the state vector and d is the dimension of the input vector. On the other hand, only (1/ε) policy improvement steps suffice, resulting in an overall sample complexity of (n+d)^3 ε^-2(1/ε). We furthermore build on our analysis and construct a simple adaptive procedure based on ε-greedy exploration which relies on approximate PI as a sub-routine and obtains T^2/3 regret, improving upon a recent result of Abbasi-Yadkori et al.

READ FULL TEXT
research
12/09/2018

The Gap Between Model-Based and Model-Free Methods on the Linear Quadratic Regulator: An Asymptotic Viewpoint

The effectiveness of model-based versus model-free methods is a long-sta...
research
05/14/2008

Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration

Several approximate policy iteration schemes without value functions, wh...
research
01/04/2021

Derivative-Free Policy Optimization for Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity

Direct policy search serves as one of the workhorses in modern reinforce...
research
08/05/2022

Sample Complexity of Policy-Based Methods under Off-Policy Sampling and Linear Function Approximation

In this work, we study policy-based methods for solving the reinforcemen...
research
12/22/2017

Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator

Reinforcement learning (RL) has been successfully used to solve many con...
research
01/28/2022

Planning and Learning with Adaptive Lookahead

The classical Policy Iteration (PI) algorithm alternates between greedy ...
research
09/08/2023

Learning Zero-Sum Linear Quadratic Games with Improved Sample Complexity

Zero-sum Linear Quadratic (LQ) games are fundamental in optimal control ...

Please sign up or login with your details

Forgot password? Click here to reset