Efficient Eligibility Traces for Deep Reinforcement Learning

10/23/2018
by   Brett Daley, et al.
0

Eligibility traces are an effective technique to accelerate reinforcement learning by smoothly assigning credit to recently visited states. However, their online implementation is incompatible with modern deep reinforcement learning algorithms, which rely heavily on i.i.d. training data and offline learning. We utilize an efficient, recursive method for computing λ-returns offline that can provide the benefits of eligibility traces to any value-estimation or actor-critic method. We demonstrate how our method can be combined with DQN, DRQN, and A3C to greatly enhance the learning speed of these algorithms when playing Atari 2600 games, even under partial observability. Our results indicate several-fold improvements to sample efficiency on Seaquest and Q*bert. We expect similar results for other algorithms and domains not considered here, including those with continuous actions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2018

Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations

Pretraining with expert demonstrations have been found useful in speedin...
research
04/18/2017

Investigating Recurrence and Eligibility Traces in Deep Q-Networks

Eligibility traces in reinforcement learning are used as a bias-variance...
research
10/28/2019

Neural Architecture Evolution in Deep Reinforcement Learning for Continuous Control

Current Deep Reinforcement Learning algorithms still heavily rely on han...
research
07/08/2018

Auto Deep Compression by Reinforcement Learning Based Actor-Critic Structure

Model-based compression is an effective, facilitating, and expanded mode...
research
02/12/2021

Q-Value Weighted Regression: Reinforcement Learning with Limited Data

Sample efficiency and performance in the offline setting have emerged as...
research
05/24/2022

Concurrent Credit Assignment for Data-efficient Reinforcement Learning

The capability to widely sample the state and action spaces is a key ing...
research
04/03/2009

Eligibility Propagation to Speed up Time Hopping for Reinforcement Learning

A mechanism called Eligibility Propagation is proposed to speed up the T...

Please sign up or login with your details

Forgot password? Click here to reset