Successive Over Relaxation Q-Learning

03/09/2019
by   Chandramouli Kamanchi, et al.
0

In a discounted reward Markov Decision Process (MDP) the objective is to find the optimal value function, i.e., the value function corresponding to an optimal policy. This problem reduces to solving a functional equation known as the Bellman equation and fixed point iteration scheme known as value iteration is utilized to obtain the solution. In [1], a successive over relaxation value iteration scheme is proposed to speed up the computation of the optimal value function. They propose a modified Bellman equation and prove the faster convergence to the optimal value function. However, in many practical applications, the model information is not known and we resort to Reinforcement Learning (RL) algorithms to obtain optimal policy and value function. One such popular algorithm is Q-Learning. In this paper, we propose Successive Over Relaxation (SOR) Q-Learning. We first derive a fixed point iteration for optimal Q-values based on [1] and utilize the stochastic approximation scheme to derive a learning algorithm to compute the optimal value function and an optimal policy. We then prove the convergence of the SOR Q-Learning to optimal Q-values. Finally, through numerical experiments, we show that SOR Q-Learning is faster compared to the Q-Learning algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/10/2019

Second Order Value Iteration in Reinforcement Learning

Value iteration is a fixed point iteration technique utilized to obtain ...
research
06/16/2019

Solution of Two-Player Zero-Sum Game by Successive Relaxation

We consider the problem of two-player zero-sum game. In this setting, th...
research
04/25/2019

Zap Q-Learning for Optimal Stopping Time Problems

We propose a novel reinforcement learning algorithm that approximates so...
research
11/01/2019

Generalized Speedy Q-learning

In this paper, we derive a generalization of the Speedy Q-learning (SQL)...
research
07/29/2018

Optimal Tap Setting of Voltage Regulation Transformers Using Batch Reinforcement Learning

In this paper, we address the problem of setting the tap positions of vo...
research
01/31/2018

An Incremental Off-policy Search in a Model-free Markov Decision Process Using a Single Sample Path

In this paper, we consider a modified version of the control problem in ...
research
03/22/2021

Convergence of Finite Memory Q-Learning for POMDPs and Near Optimality of Learned Policies under Filter Stability

In this paper, for POMDPs, we provide the convergence of a Q learning al...

Please sign up or login with your details

Forgot password? Click here to reset