Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines

06/20/2017
by   Philip S. Thomas, et al.
0

We show how an action-dependent baseline can be used by the policy gradient theorem using function approximation, originally presented with action-independent baselines by (Sutton et al. 2000).

READ FULL TEXT

page 1

page 2

03/20/2018

Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines

Policy gradient methods have enjoyed great success in deep reinforcement...
02/27/2018

The Mirage of Action-Dependent Baselines in Reinforcement Learning

Policy gradient methods are a widely used class of model-free reinforcem...
06/25/2020

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

In recent years, Deep Reinforcement Learning (DRL) algorithms have achie...
01/28/2020

Parameter Sharing in Coagent Networks

In this paper, we aim to prove the theorem that generalizes the Coagent ...
04/21/2021

Discrete-continuous Action Space Policy Gradient-based Attention for Image-Text Matching

Image-text matching is an important multi-modal task with massive applic...
11/26/2018

A Policy Gradient Method with Variance Reduction for Uplift Modeling

Uplift modeling aims to directly model the incremental impact of a treat...
10/21/2019

All-Action Policy Gradient Methods: A Numerical Integration Approach

While often stated as an instance of the likelihood ratio trick [Rubinst...