Classical Policy Gradient: Preserving Bellman's Principle of Optimality

06/06/2019
by   Philip S. Thomas, et al.
0

We propose a new objective function for finite-horizon episodic Markov decision processes that better captures Bellman's principle of optimality, and provide an expression for the gradient of the objective.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset