The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms

03/01/2023
by   Anirudh Vemula, et al.
0

We propose a novel approach to addressing two fundamental challenges in Model-based Reinforcement Learning (MBRL): the computational expense of repeatedly finding a good policy in the learned model, and the objective mismatch between model fitting and policy computation. Our "lazy" method leverages a novel unified objective, Performance Difference via Advantage in Model, to capture the performance difference between the learned policy and expert policy under the true dynamics. This objective demonstrates that optimizing the expected policy advantage in the learned model under an exploration distribution is sufficient for policy computation, resulting in a significant boost in computational efficiency compared to traditional planning methods. Additionally, the unified objective uses a value moment matching term for model fitting, which is aligned with the model's usage during policy computation. We present two no-regret algorithms to optimize the proposed objective, and demonstrate their statistical and computational gains compared to existing MBRL methods through simulated benchmarks.

READ FULL TEXT
research
06/26/2021

Model-Advantage Optimization for Model-Based Reinforcement Learning

Model-based Reinforcement Learning (MBRL) algorithms have been tradition...
research
10/12/2022

A Unified Framework for Alternating Offline Model Training and Policy Learning

In offline model-based reinforcement learning (offline MBRL), we learn a...
research
03/24/2021

Discriminator Augmented Model-Based Reinforcement Learning

By planning through a learned dynamics model, model-based reinforcement ...
research
06/25/2018

Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards

The most data-efficient algorithms for reinforcement learning in robotic...
research
06/01/2023

What model does MuZero learn?

Model-based reinforcement learning has drawn considerable interest in re...
research
05/29/2023

One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration

In online reinforcement learning (online RL), balancing exploration and ...
research
02/11/2020

Objective Mismatch in Model-based Reinforcement Learning

Model-based reinforcement learning (MBRL) has been shown to be a powerfu...

Please sign up or login with your details

Forgot password? Click here to reset