How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization

04/29/2020
by   Pierluca D'Oro, et al.
0

Deterministic-policy actor-critic algorithms for continuous control improve the actor by plugging its actions into the critic and ascending the action-value gradient, which is obtained by chaining the actor's Jacobian matrix with the gradient of the critic w.r.t. input actions. However, instead of gradients, the critic is, typically, only trained to accurately predict expected returns, which, on their own, are useless for policy optimization. In this paper, we propose MAGE, a model-based actor-critic algorithm, grounded in the theory of policy gradients, which explicitly learns the action-value gradient. MAGE backpropagates through the learned dynamics to compute gradient targets in temporal difference learning, leading to a critic tailored for policy improvement. On a set of MuJoCo continuous-control tasks, we demonstrate the efficiency of the algorithm with respect to model-free and model-based state-of-the-art baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/06/2017

Revisiting stochastic off-policy action-value gradients

Off-policy stochastic actor-critic methods rely on approximating the sto...
research
05/16/2020

Model-Augmented Actor-Critic: Backpropagating through Paths

Current model-based reinforcement learning approaches use the model simp...
research
10/19/2020

Softmax Deep Double Deterministic Policy Gradients

A widely-used actor-critic reinforcement learning algorithm for continuo...
research
11/13/2020

Critic PI2: Master Continuous Planning via Policy Improvement with Path Integrals and Deep Actor-Critic Reinforcement Learning

Constructing agents with planning capabilities has long been one of the ...
research
10/10/2020

Trust the Model When It Is Confident: Masked Model-based Actor-Critic

It is a popular belief that model-based Reinforcement Learning (RL) is m...
research
06/04/2021

A Learning-based Optimal Market Bidding Strategy for Price-Maker Energy Storage

Load serving entities with storage units reach sizes and performances th...
research
02/09/2021

Learning State Representations from Random Deep Action-conditional Predictions

In this work, we study auxiliary prediction tasks defined by temporal-di...

Please sign up or login with your details

Forgot password? Click here to reset