Second Order Value Iteration in Reinforcement Learning

05/10/2019
by   Chandramouli Kamanchi, et al.
0

Value iteration is a fixed point iteration technique utilized to obtain the optimal value function and policy in a discounted reward Markov Decision Process (MDP). Here, a contraction operator is constructed and applied repeatedly to arrive at the optimal solution. Value iteration is a first order method and therefore it may take a large number of iterations to converge to the optimal solution. In this work, we propose a novel second order value iteration procedure based on the Newton-Raphson method. We first construct a modified contraction operator and then apply Newton-Raphson method to arrive at our algorithm. We prove the global convergence of our algorithm to the optimal solution and show the second order convergence. Through experiments, we demonstrate the effectiveness of our proposed approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/09/2019

Successive Over Relaxation Q-Learning

In a discounted reward Markov Decision Process (MDP) the objective is to...
research
04/21/2023

A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning

We consider the problem of control in the setting of reinforcement learn...
research
06/01/2011

Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes

Partially observable Markov decision processes (POMDPs) have recently be...
research
01/28/2022

Planning and Learning with Adaptive Lookahead

The classical Policy Iteration (PI) algorithm alternates between greedy ...
research
04/24/2019

Some Limit Properties of Markov Chains Induced by Stochastic Recursive Algorithms

Recursive stochastic algorithms have gained significant attention in the...
research
01/22/2013

Properties of the Least Squares Temporal Difference learning algorithm

This paper presents four different ways of looking at the well-known Lea...
research
06/05/2019

A neural network based policy iteration algorithm with global H^2-superlinear convergence for stochastic games on domains

In this work, we propose a class of numerical schemes for solving semili...

Please sign up or login with your details

Forgot password? Click here to reset