DeepAI AI Chat
Log In Sign Up

Planning with Uncertainty: Deep Exploration in Model-Based Reinforcement Learning

10/21/2022
by   Yaniv Oren, et al.
Delft University of Technology
0

Deep model-based Reinforcement Learning (RL) has shown super-human performance in many challenging domains. Low sample efficiency and limited exploration remain as leading obstacles in the field, however. In this paper, we demonstrate deep exploration in model-based RL by incorporating epistemic uncertainty into planning trees, circumventing the standard approach of propagating uncertainty through value learning. We evaluate this approach with the state of the art model-based RL algorithm MuZero, and extend its training process to stabilize learning from explicitly-exploratory trajectories. In our experiments planning with uncertainty is able to demonstrate effective deep exploration with standard uncertainty estimation mechanisms, and with it significant gains in sample efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

12/17/2022

Latent Variable Representation for Reinforcement Learning

Deep latent variable models have achieved significant empirical successe...
06/15/2020

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Model-based reinforcement learning algorithms with probabilistic dynamic...
10/20/2022

Model-based Lifelong Reinforcement Learning with Bayesian Exploration

We propose a model-based lifelong reinforcement-learning approach that e...
12/02/2021

Maximum Entropy Model-based Reinforcement Learning

Recent advances in reinforcement learning have demonstrated its ability ...
01/29/2023

Sample Efficient Deep Reinforcement Learning via Local Planning

The focus of this work is sample-efficient deep reinforcement learning (...
02/24/2023

Model-Based Uncertainty in Value Functions

We consider the problem of quantifying uncertainty over expected cumulat...
12/10/2018

Improving Model-Based Control and Active Exploration with Reconstruction Uncertainty Optimization

Model based predictions of future trajectories of a dynamical system oft...