Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines

06/20/2017
by   Philip S. Thomas, et al.
0

We show how an action-dependent baseline can be used by the policy gradient theorem using function approximation, originally presented with action-independent baselines by (Sutton et al. 2000).

READ FULL TEXT

page 1

page 2

research
12/28/2022

On the Convergence of Discounted Policy Gradient Methods

Many popular policy gradient methods for reinforcement learning follow a...
research
02/27/2018

The Mirage of Action-Dependent Baselines in Reinforcement Learning

Policy gradient methods are a widely used class of model-free reinforcem...
research
06/25/2020

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

In recent years, Deep Reinforcement Learning (DRL) algorithms have achie...
research
01/28/2020

Parameter Sharing in Coagent Networks

In this paper, we aim to prove the theorem that generalizes the Coagent ...
research
04/21/2021

Discrete-continuous Action Space Policy Gradient-based Attention for Image-Text Matching

Image-text matching is an important multi-modal task with massive applic...
research
10/21/2019

All-Action Policy Gradient Methods: A Numerical Integration Approach

While often stated as an instance of the likelihood ratio trick [Rubinst...
research
07/19/2021

Constrained Policy Gradient Method for Safe and Fast Reinforcement Learning: a Neural Tangent Kernel Based Approach

This paper presents a constrained policy gradient algorithm. We introduc...

Please sign up or login with your details

Forgot password? Click here to reset