Model-Advantage Optimization for Model-Based Reinforcement Learning

06/26/2021
by   Nirbhay Modhe, et al.
0

Model-based Reinforcement Learning (MBRL) algorithms have been traditionally designed with the goal of learning accurate dynamics of the environment. This introduces a mismatch between the objectives of model-learning and the overall learning problem of finding an optimal policy. Value-aware model learning, an alternative model-learning paradigm to maximum likelihood, proposes to inform model-learning through the value function of the learnt policy. While this paradigm is theoretically sound, it does not scale beyond toy settings. In this work, we propose a novel value-aware objective that is an upper bound on the absolute performance difference of a policy across two models. Further, we propose a general purpose algorithm that modifies the standard MBRL pipeline – enabling learning with value aware objectives. Our proposed objective, in conjunction with this algorithm, is the first successful instantiation of value-aware MBRL on challenging continuous control environments, outperforming previous value-aware objectives and with competitive performance w.r.t. MLE-based MBRL approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/04/2022

Value Gradient weighted Model-Based Reinforcement Learning

Model-based reinforcement learning (MBRL) is a sample efficient techniqu...
research
03/01/2023

The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms

We propose a novel approach to addressing two fundamental challenges in ...
research
06/01/2018

Equivalence Between Wasserstein and Value-Aware Model-based Reinforcement Learning

Learning a generative model is a key component of model-based reinforcem...
research
02/11/2020

Objective Mismatch in Model-based Reinforcement Learning

Model-based reinforcement learning (MBRL) has been shown to be a powerfu...
research
05/22/2023

TOM: Learning Policy-Aware Models for Model-Based Reinforcement Learning via Transition Occupancy Matching

Standard model-based reinforcement learning (MBRL) approaches fit a tran...
research
06/30/2023

λ-AC: Learning latent decision-aware models for reinforcement learning in continuous state-spaces

The idea of decision-aware model learning, that models should be accurat...
research
06/06/2021

Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation

The shortcomings of maximum likelihood estimation in the context of mode...

Please sign up or login with your details

Forgot password? Click here to reset