Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation

06/06/2021
by   Evgenii Nikishin, et al.
0

The shortcomings of maximum likelihood estimation in the context of model-based reinforcement learning have been highlighted by an increasing number of papers. When the model class is misspecified or has a limited representational capacity, model parameters with high likelihood might not necessarily result in high performance of the agent on a downstream control task. To alleviate this problem, we propose an end-to-end approach for model learning which directly optimizes the expected returns using implicit differentiation. We treat a value function that satisfies the Bellman optimality operator induced by the model as an implicit function of model parameters and show how to differentiate the function. We provide theoretical and empirical evidence highlighting the benefits of our approach in the model misspecification regime compared to likelihood-based methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/24/2018

Implicit Maximum Likelihood Estimation

Implicit probabilistic models are models defined naturally in terms of a...
research
05/28/2019

A Control-Model-Based Approach for Reinforcement Learning

We consider a new form of model-based reinforcement learning methods tha...
research
04/04/2022

Value Gradient weighted Model-Based Reinforcement Learning

Model-based reinforcement learning (MBRL) is a sample efficient techniqu...
research
06/03/2021

Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions

Integrating discrete probability distributions and combinatorial optimiz...
research
12/08/2015

Minimum Risk Training for Neural Machine Translation

We propose minimum risk training for end-to-end neural machine translati...
research
06/26/2021

Model-Advantage Optimization for Model-Based Reinforcement Learning

Model-based Reinforcement Learning (MBRL) algorithms have been tradition...
research
11/23/2021

Multiset-Equivariant Set Prediction with Approximate Implicit Differentiation

Most set prediction models in deep learning use set-equivariant operatio...

Please sign up or login with your details

Forgot password? Click here to reset