Policy-Aware Model Learning for Policy Gradient Methods

02/28/2020
by   Romina Abachi, et al.
0

This paper considers the problem of learning a model in model-based reinforcement learning (MBRL). We examine how the planning module of an MBRL algorithm uses the model, and propose that the model learning module should incorporate the way the planner is going to use the model. This is in contrast to conventional model learning approaches, such as those based on maximum likelihood estimate, that learn a predictive model of the environment without explicitly considering the interaction of the model and the planner. We focus on policy gradient type of planning algorithms and derive new loss functions for model learning that incorporate how the planner uses the model. We call this approach Policy-Aware Model Learning (PAML). We theoretically analyze a generic model-based policy gradient algorithm and provide a convergence guarantee for the optimized policy. We also empirically evaluate PAML on some benchmark problems, showing promising results.

READ FULL TEXT
research
09/09/2019

Gradient-Aware Model-based Policy Search

Traditional model-based reinforcement learning approaches learn a model ...
research
06/01/2017

Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning

Off-policy model-free deep reinforcement learning methods using previous...
research
12/03/2018

Mitigating Planner Overfitting in Model-Based Reinforcement Learning

An agent with an inaccurate model of its environment faces a difficult c...
research
02/07/2019

Multimodal Conditional Learning with Fast Thinking Policy-like Model and Slow Thinking Planner-like Model

This paper studies the supervised learning of the conditional distributi...
research
10/23/2018

Learning Classical Planning Strategies with Policy Gradient

A common paradigm in classical planning is heuristic forward search. For...
research
04/04/2022

Value Gradient weighted Model-Based Reinforcement Learning

Model-based reinforcement learning (MBRL) is a sample efficient techniqu...
research
02/05/2019

Total stochastic gradient algorithms and applications in reinforcement learning

Backpropagation and the chain rule of derivatives have been prominent; h...

Please sign up or login with your details

Forgot password? Click here to reset