Critic Algorithms using Cooperative Networks

01/19/2022
by   Debangshu Banerjee, et al.
0

An algorithm is proposed for policy evaluation in Markov Decision Processes which gives good empirical results with respect to convergence rates. The algorithm tracks the Projected Bellman Error and is implemented as a true gradient based algorithm. In this respect this algorithm differs from TD(λ) class of algorithms. This algorithm tracks the Projected Bellman Algorithm and is therefore different from the class of residual algorithms. Further the convergence of this algorithm is empirically much faster than GTD2 class of algorithms which aim at tracking the Projected Bellman Error. We implemented proposed algorithm in DQN and DDPG framework and found that our algorithm achieves comparable results in both of these experiments

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/27/2017

On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning

We consider off-policy temporal-difference (TD) learning methods for pol...
research
09/05/2023

Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes

In this paper, we consider an infinite horizon average reward Markov Dec...
research
08/14/2015

Emphatic TD Bellman Operator is a Contraction

Recently, SuttonMW15 introduced the emphatic temporal differences (ETD) ...
research
10/04/2022

Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies

We consider infinite-horizon discounted Markov decision processes and st...
research
11/03/2022

Geometry and convergence of natural policy gradient methods

We study the convergence of several natural policy gradient (NPG) method...
research
09/22/2019

Faster saddle-point optimization for solving large-scale Markov decision processes

We consider the problem of computing optimal policies in average-reward ...

Please sign up or login with your details

Forgot password? Click here to reset