Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines

06/20/2017 ∙ by Philip S. Thomas, et al. ∙ 0

We show how an action-dependent baseline can be used by the policy gradient theorem using function approximation, originally presented with action-independent baselines by (Sutton et al. 2000).

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.