Double Q(σ) and Q(σ, λ): Unifying Reinforcement Learning Control Algorithms

11/05/2017
by   Markus Dumke, et al.
0

Temporal-difference (TD) learning is an important field in reinforcement learning. Sarsa and Q-Learning are among the most used TD algorithms. The Q(σ) algorithm (Sutton and Barto (2017)) unifies both. This paper extends the Q(σ) algorithm to an online multi-step algorithm Q(σ, λ) using eligibility traces and introduces Double Q(σ) as the extension of Q(σ) to double learning. Experiments suggest that the new Q(σ, λ) algorithm can outperform the classical TD control methods Sarsa(λ), Q(λ) and Q(σ).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/15/2020

Chrome Dino Run using Reinforcement Learning

Reinforcement Learning is one of the most advanced set of algorithms kno...
research
06/17/2023

Vanishing Bias Heuristic-guided Reinforcement Learning Algorithm

Reinforcement Learning has achieved tremendous success in the many Atari...
research
03/08/2018

The Advantage of Doubling: A Deep Reinforcement Learning Approach to Studying the Double Team in the NBA

During the 2017 NBA playoffs, Celtics coach Brad Stevens was faced with ...
research
10/07/2022

Elastic Step DQN: A novel multi-step algorithm to alleviate overestimation in Deep QNetworks

Deep Q-Networks algorithm (DQN) was the first reinforcement learning alg...
research
12/13/2015

True Online Temporal-Difference Learning

The temporal-difference methods TD(λ) and Sarsa(λ) form a core part of m...
research
02/09/2018

A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning

Recently, a new multi-step temporal learning algorithm, called Q(σ), uni...
research
04/11/2022

Implementing Online Reinforcement Learning with Temporal Neural Networks

A Temporal Neural Network (TNN) architecture for implementing efficient ...

Please sign up or login with your details

Forgot password? Click here to reset