Generalized Speedy Q-learning

11/01/2019
by   Indu John, et al.
0

In this paper, we derive a generalization of the Speedy Q-learning (SQL) algorithm that was proposed in the Reinforcement Learning (RL) literature to handle slow convergence of Watkins' Q-learning. In most RL algorithms such as Q-learning, the Bellman equation and the Bellman operator play an important role. It is possible to generalize the Bellman operator using the technique of successive relaxation. We use the generalized Bellman operator to derive a simple and efficient family of algorithms called Generalized Speedy Q-learning (GSQL-w) and analyze its finite time performance. We show that GSQL-w has an improved finite time performance bound compared to SQL for the case when the relaxation parameter w is greater than 1. This improvement is a consequence of the contraction factor of the generalized Bellman operator being less than that of the standard Bellman operator. Numerical experiments are provided to demonstrate the empirical performance of the GSQL-w algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/09/2019

Successive Over Relaxation Q-Learning

In a discounted reward Markov Decision Process (MDP) the objective is to...
research
01/20/2022

Transfer Learning for Operator Selection: A Reinforcement Learning Approach

In the past two decades, metaheuristic optimization algorithms (MOAs) ha...
research
10/17/2021

Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization

Anderson mixing has been heuristically applied to reinforcement learning...
research
06/18/2019

Robust Reinforcement Learning for Continuous Control with Model Misspecification

We provide a framework for incorporating robustness -- to perturbations ...
research
02/03/2020

Finite-Sample Analysis of Stochastic Approximation Using Smooth Convex Envelopes

Stochastic Approximation (SA) is a popular approach for solving fixed po...
research
12/02/2018

Revisiting the Softmax Bellman Operator: Theoretical Properties and Practical Benefits

The softmax function has been primarily employed in reinforcement learni...
research
06/29/2020

SoftSort: A Continuous Relaxation for the argsort Operator

While sorting is an important procedure in computer science, the argsort...

Please sign up or login with your details

Forgot password? Click here to reset