Per-decision Multi-step Temporal Difference Learning with Control Variates

07/05/2018
by   Kristopher De Asis, et al.
0

Multi-step temporal difference (TD) learning is an important approach in reinforcement learning, as it unifies one-step TD learning with Monte Carlo methods in a way where intermediate algorithms can outperform either extreme. They address a bias-variance trade off between reliance on current estimates, which could be poor, and incorporating longer sampled reward sequences into the updates. Especially in the off-policy setting, where the agent aims to learn about a policy different from the one generating its behaviour, the variance in the updates can cause learning to diverge as the number of sampled rewards used in the estimates increases. In this paper, we introduce per-decision control variates for multi-step TD algorithms, and compare them to existing methods. Our results show that including the control variates can greatly improve performance on both on and off-policy multi-step temporal difference learning tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/09/2018

A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning

Recently, a new multi-step temporal learning algorithm, called Q(σ), uni...
research
10/07/2022

Elastic Step DQN: A novel multi-step algorithm to alleviate overestimation in Deep QNetworks

Deep Q-Networks algorithm (DQN) was the first reinforcement learning alg...
research
06/04/2022

Adaptive Tree Backup Algorithms for Temporal-Difference Reinforcement Learning

Q(σ) is a recently proposed temporal-difference learning method that int...
research
02/27/2023

Taylor TD-learning

Many reinforcement learning approaches rely on temporal-difference (TD) ...
research
12/28/2016

Efficient iterative policy optimization

We tackle the issue of finding a good policy when the number of policy u...
research
02/16/2016

Q(λ) with Off-Policy Corrections

We propose and analyze an alternate approach to off-policy multi-step te...
research
09/30/2019

Off-policy Multi-step Q-learning

In the past few years, off-policy reinforcement learning methods have sh...

Please sign up or login with your details

Forgot password? Click here to reset