Incremental Truncated LSTD

11/26/2015
by   Clement Gehring, et al.
0

Balancing between computational efficiency and sample efficiency is an important goal in reinforcement learning. Temporal difference (TD) learning algorithms stochastically update the value function, with a linear time complexity in the number of features, whereas least-squares temporal difference (LSTD) algorithms are sample efficient but can be quadratic in the number of features. In this work, we develop an efficient incremental low-rank LSTD(λ) algorithm that progresses towards the goal of better balancing computation and sample efficiency. The algorithm reduces the computation and storage complexity to the number of features times the chosen rank parameter while summarizing past samples efficiently to nearly obtain the sample complexity of LSTD. We derive a simulation bound on the solution given by truncated low-rank approximation, illustrating a bias- variance trade-off dependent on the choice of rank. We demonstrate that the algorithm effectively balances computational complexity and sample efficiency for policy evaluation in a benchmark task and a high-dimensional energy allocation domain.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2022

Overcoming the Long Horizon Barrier for Sample-Efficient Reinforcement Learning with Latent Low-Rank Structure

The practicality of reinforcement learning algorithms has been limited d...
research
11/28/2016

Accelerated Gradient Temporal Difference Learning

The family of temporal difference (TD) methods span a spectrum from comp...
research
10/13/2021

PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning Method

Emphatic temporal difference (ETD) learning (Sutton et al., 2016) is a s...
research
11/03/2020

Episodic Linear Quadratic Regulators with Low-rank Transitions

Linear Quadratic Regulators (LQR) achieve enormous successful real-world...
research
08/29/2023

An Incremental SVD Method for Non-Fickian Flows in Porous Media: Addressing Storage and Computational Challenges

It is well known that the numerical solution of the Non-Fickian flows at...
research
06/22/2023

Generalized Low-Rank Update: Model Parameter Bounds for Low-Rank Training Data Modifications

In this study, we have developed an incremental machine learning (ML) me...
research
10/20/2022

Krylov-Bellman boosting: Super-linear policy evaluation in general state spaces

We present and analyze the Krylov-Bellman Boosting (KBB) algorithm for p...

Please sign up or login with your details

Forgot password? Click here to reset