Adaptive Lambda Least-Squares Temporal Difference Learning

12/30/2016
by   Timothy A. Mann, et al.
0

Temporal Difference learning or TD(λ) is a fundamental algorithm in the field of reinforcement learning. However, setting TD's λ parameter, which controls the timescale of TD updates, is generally left up to the practitioner. We formalize the λ selection problem as a bias-variance trade-off where the solution is the value of λ that leads to the smallest Mean Squared Value Error (MSVE). To solve this trade-off we suggest applying Leave-One-Trajectory-Out Cross-Validation (LOTO-CV) to search the space of λ values. Unfortunately, this approach is too computationally expensive for most practical applications. For Least Squares TD (LSTD) we show that LOTO-CV can be implemented efficiently to automatically tune λ and apply function optimization methods to efficiently search the space of λ values. The resulting algorithm, ALLSTD, is parameter free and our experiments demonstrate that ALLSTD is significantly computationally faster than the naïve LOTO-CV implementation while achieving similar performance.

READ FULL TEXT
research
06/04/2022

Adaptive Tree Backup Algorithms for Temporal-Difference Reinforcement Learning

Q(σ) is a recently proposed temporal-difference learning method that int...
research
11/24/2021

Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning

Accurate value estimates are important for off-policy reinforcement lear...
research
06/30/2021

Do we need to estimate the variance in robust mean estimation?

This paper studies robust mean estimators for distributions with only fi...
research
01/30/2023

On the Statistical Benefits of Temporal Difference Learning

Given a dataset on actions and resulting long-term rewards, a direct est...
research
10/16/2018

The Concept of Criticality in Reinforcement Learning

Reinforcement learning methods carry a well known bias-variance trade-of...
research
09/07/2023

Efficient estimation and correction of selection-induced bias with order statistics

Model selection aims to identify a sufficiently well performing model th...
research
05/30/2022

Automatic Search Interval for Smoothing Parameter in Penalized Splines

The selection of smoothing parameter is central to estimation of penaliz...

Please sign up or login with your details

Forgot password? Click here to reset