Value Gradient weighted Model-Based Reinforcement Learning

04/04/2022
by   Claas Voelcker, et al.
4

Model-based reinforcement learning (MBRL) is a sample efficient technique to obtain control policies, yet unavoidable modeling errors often lead performance deterioration. The model in MBRL is often solely fitted to reconstruct dynamics, state observations in particular, while the impact of model error on the policy is not captured by the training objective. This leads to a mismatch between the intended goal of MBRL, enabling good policy and value learning, and the target of the loss function employed in practice, future state prediction. Naive intuition would suggest that value-aware model learning would fix this problem and, indeed, several solutions to this objective mismatch problem have been proposed based on theoretical analysis. However, they tend to be inferior in practice to commonly used maximum likelihood (MLE) based approaches. In this paper we propose the Value-gradient weighted Model Learning (VaGraM), a novel method for value-aware model learning which improves the performance of MBRL in challenging settings, such as small model capacity and the presence of distracting state dimensions. We analyze both MLE and value-aware approaches and demonstrate how they fail to account for exploration and the behavior of function approximation when learning value-aware models and highlight the additional goals that must be met to stabilize optimization in the deep learning setting. We verify our analysis by showing that our loss function is able to achieve high returns on the Mujoco benchmark suite while being more robust than maximum likelihood based approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2021

Model-Advantage Optimization for Model-Based Reinforcement Learning

Model-based Reinforcement Learning (MBRL) algorithms have been tradition...
research
05/22/2023

TOM: Learning Policy-Aware Models for Model-Based Reinforcement Learning via Transition Occupancy Matching

Standard model-based reinforcement learning (MBRL) approaches fit a tran...
research
03/02/2021

Minimax Model Learning

We present a novel off-policy loss function for learning a transition mo...
research
02/11/2020

Objective Mismatch in Model-based Reinforcement Learning

Model-based reinforcement learning (MBRL) has been shown to be a powerfu...
research
06/06/2021

Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation

The shortcomings of maximum likelihood estimation in the context of mode...
research
06/01/2018

Equivalence Between Wasserstein and Value-Aware Model-based Reinforcement Learning

Learning a generative model is a key component of model-based reinforcem...
research
02/28/2020

Policy-Aware Model Learning for Policy Gradient Methods

This paper considers the problem of learning a model in model-based rein...

Please sign up or login with your details

Forgot password? Click here to reset