Model-Augmented Actor-Critic: Backpropagating through Paths

05/16/2020
by   Ignasi Clavera, et al.
7

Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator to augment the data for policy optimization or value function learning. In this paper, we show how to make more effective use of the model by exploiting its differentiability. We construct a policy optimization algorithm that uses the pathwise derivative of the learned model and policy across future timesteps. Instabilities of learning across many timesteps are prevented by using a terminal value function, learning the policy in an actor-critic fashion. Furthermore, we present a derivation on the monotonic improvement of our objective in terms of the gradient error in the model and value function. We show that our approach (i) is consistently more sample efficient than existing state-of-the-art model-based algorithms, (ii) matches the asymptotic performance of model-free algorithms, and (iii) scales to long horizons, a regime where typically past model-based approaches have struggled.

READ FULL TEXT

page 7

page 8

research
04/29/2020

How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization

Deterministic-policy actor-critic algorithms for continuous control impr...
research
08/23/2020

Learning Off-Policy with Online Planning

We propose Learning Off-Policy with Online Planning (LOOP), combining th...
research
10/10/2020

Trust the Model When It Is Confident: Masked Model-based Actor-Critic

It is a popular belief that model-based Reinforcement Learning (RL) is m...
research
03/28/2022

Revisiting Model-based Value Expansion

Model-based value expansion methods promise to improve the quality of va...
research
08/28/2020

On the model-based stochastic value gradient for continuous reinforcement learning

Model-based reinforcement learning approaches add explicit domain knowle...
research
04/05/2022

Model Based Meta Learning of Critics for Policy Gradients

Being able to seamlessly generalize across different tasks is fundamenta...
research
10/08/2019

Deep Value Model Predictive Control

In this paper, we introduce an actor-critic algorithm called Deep Value ...

Please sign up or login with your details

Forgot password? Click here to reset