Value-Distributional Model-Based Reinforcement Learning

08/12/2023
by   Carlos E. Luis, et al.
0

Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks. We study the problem from a model-based Bayesian reinforcement learning perspective, where the goal is to learn the posterior distribution over value functions induced by parameter (epistemic) uncertainty of the Markov decision process. Previous work restricts the analysis to a few moments of the distribution over values or imposes a particular distribution shape, e.g., Gaussians. Inspired by distributional reinforcement learning, we introduce a Bellman operator whose fixed-point is the value distribution function. Based on our theory, we propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function that can be used for policy optimization. Evaluation across several continuous-control tasks shows performance benefits with respect to established model-based and model-free algorithms.

READ FULL TEXT

page 15

page 26

research
02/24/2023

Model-Based Uncertainty in Value Functions

We consider the problem of quantifying uncertainty over expected cumulat...
research
11/18/2018

Policy Optimization with Model-based Explorations

Model-free reinforcement learning methods such as the Proximal Policy Op...
research
04/05/2013

Model-based Bayesian Reinforcement Learning for Dialogue Management

Reinforcement learning methods are increasingly used to optimise dialogu...
research
02/08/2020

Inferential Induction: Joint Bayesian Estimation of MDPs and Value Functions

Bayesian reinforcement learning (BRL) offers a decision-theoretic soluti...
research
02/11/2023

Distributional GFlowNets with Quantile Flows

Generative Flow Networks (GFlowNets) are a new family of probabilistic s...
research
01/29/2022

Bellman Meets Hawkes: Model-Based Reinforcement Learning via Temporal Point Processes

We consider a sequential decision making problem where the agent faces t...
research
07/06/2021

Bayesian Nonparametric Modelling for Model-Free Reinforcement Learning in LTE-LAA and Wi-Fi Coexistence

With the arrival of next generation wireless communication, a growing nu...

Please sign up or login with your details

Forgot password? Click here to reset