Revisiting stochastic off-policy action-value gradients

03/06/2017
by   Yemi Okesanjo, et al.
0

Off-policy stochastic actor-critic methods rely on approximating the stochastic policy gradient in order to derive an optimal policy. One may also derive the optimal policy by approximating the action-value gradient. The use of action-value gradients is desirable as policy improvement occurs along the direction of steepest ascent. This has been studied extensively within the context of natural gradient actor-critic algorithms and more recently within the context of deterministic policy gradients. In this paper we briefly discuss the off-policy stochastic counterpart to deterministic action-value gradients, as well as an incremental approach for following the policy gradient in lieu of the natural gradient.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2020

How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization

Deterministic-policy actor-critic algorithms for continuous control impr...
research
10/19/2021

Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

Learning optimal behavior from existing data is one of the most importan...
research
02/20/2021

Causal Policy Gradients

Policy gradient methods can solve complex tasks but often fail when the ...
research
07/25/2018

Backprop-Q: Generalized Backpropagation for Stochastic Computation Graphs

In real-world scenarios, it is appealing to learn a model carrying out s...
research
06/15/2017

Expected Policy Gradients

We propose expected policy gradients (EPG), which unify stochastic polic...
research
10/24/2022

On All-Action Policy Gradients

In this paper, we analyze the variance of stochastic policy gradient wit...
research
09/15/2018

Sampled Policy Gradient for Learning to Play the Game Agar.io

In this paper, a new offline actor-critic learning algorithm is introduc...

Please sign up or login with your details

Forgot password? Click here to reset