Is Model Ensemble Necessary? Model-based RL via a Single Model with Lipschitz Regularized Value Function

02/02/2023
by   Ruijie Zheng, et al.
27

Probabilistic dynamics model ensemble is widely used in existing model-based reinforcement learning methods as it outperforms a single dynamics model in both asymptotic performance and sample efficiency. In this paper, we provide both practical and theoretical insights on the empirical success of the probabilistic dynamics model ensemble through the lens of Lipschitz continuity. We find that, for a value function, the stronger the Lipschitz condition is, the smaller the gap between the true dynamics- and learned dynamics-induced Bellman operators is, thus enabling the converged value function to be closer to the optimal value function. Hence, we hypothesize that the key functionality of the probabilistic dynamics model ensemble is to regularize the Lipschitz condition of the value function using generated samples. To test this hypothesis, we devise two practical robust training mechanisms through computing the adversarial noise and regularizing the value network's spectral norm to directly regularize the Lipschitz condition of the value functions. Empirical results show that combined with our mechanisms, model-based RL algorithms with a single dynamics model outperform those with an ensemble of probabilistic dynamics models. These findings not only support the theoretical insight, but also provide a practical solution for developing computationally efficient model-based RL algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2018

Lipschitz Continuity in Model-based Reinforcement Learning

Model-based reinforcement-learning methods learn transition and reward m...
research
03/28/2022

Revisiting Model-based Value Expansion

Model-based value expansion methods promise to improve the quality of va...
research
06/01/2018

Equivalence Between Wasserstein and Value-Aware Model-based Reinforcement Learning

Learning a generative model is a key component of model-based reinforcem...
research
04/07/2021

Reinforcement Learning with a Disentangled Universal Value Function for Item Recommendation

In recent years, there are great interests as well as challenges in appl...
research
07/06/2020

Fast Adaptation via Policy-Dynamics Value Functions

Standard RL algorithms assume fixed environment dynamics and require a s...
research
11/04/2022

The Benefits of Model-Based Generalization in Reinforcement Learning

Model-Based Reinforcement Learning (RL) is widely believed to have the p...
research
05/11/2021

Spectral Normalisation for Deep Reinforcement Learning: an Optimisation Perspective

Most of the recent deep reinforcement learning advances take an RL-centr...

Please sign up or login with your details

Forgot password? Click here to reset