Minimax Model Learning

03/02/2021
by   Cameron Voloshin, et al.
18

We present a novel off-policy loss function for learning a transition model in model-based reinforcement learning. Notably, our loss is derived from the off-policy policy evaluation objective with an emphasis on correcting distribution shift. Compared to previous model-based techniques, our approach allows for greater robustness under model misspecification or distribution shift induced by learning/evaluating policies that are distinct from the data-generating policy. We provide a theoretical analysis and show empirical improvements over existing model-based off-policy evaluation methods. We provide further analysis showing our loss can be used for off-policy optimization (OPO) and demonstrate its integration with more recent improvements in OPO.

READ FULL TEXT

page 1

page 3

page 7

page 8

page 28

research
06/04/2020

Meta-Model-Based Meta-Policy Optimization

Model-based reinforcement learning (MBRL) has been applied to meta-learn...
research
06/16/2020

Model Embedding Model-Based Reinforcement Learning

Model-based reinforcement learning (MBRL) has shown its advantages in sa...
research
04/05/2023

Conformal Off-Policy Evaluation in Markov Decision Processes

Reinforcement Learning aims at identifying and evaluating efficient cont...
research
04/04/2022

Value Gradient weighted Model-Based Reinforcement Learning

Model-based reinforcement learning (MBRL) is a sample efficient techniqu...
research
09/09/2019

Gradient-Aware Model-based Policy Search

Traditional model-based reinforcement learning approaches learn a model ...
research
06/04/2023

Fine-Tuning Language Models with Advantage-Induced Policy Alignment

Reinforcement learning from human feedback (RLHF) has emerged as a relia...
research
02/19/2016

Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models

In this paper we study a model-based approach to calculating approximate...

Please sign up or login with your details

Forgot password? Click here to reset