Efficient exploration with Double Uncertain Value Networks

11/29/2017
by   Thomas M. Moerland, et al.
0

This paper studies directed exploration for reinforcement learning agents by tracking uncertainty about the value of each available action. We identify two sources of uncertainty that are relevant for exploration. The first originates from limited data (parametric uncertainty), while the second originates from the distribution of the returns (return uncertainty). We identify methods to learn these distributions with deep neural networks, where we estimate parametric uncertainty with Bayesian drop-out, while return uncertainty is propagated through the Bellman equation as a Gaussian distribution. Then, we identify that both can be jointly estimated in one network, which we call the Double Uncertain Value Network. The policy is directly derived from the learned distributions based on Thompson sampling. Experimental results show that both types of uncertainty may vastly improve learning in domains with a strong exploration challenge.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2018

The Potential of the Return Distribution for Exploration in RL

This paper studies the potential of the return distribution for explorat...
research
11/17/2020

Leveraging the Variance of Return Sequences for Exploration Policy

This paper introduces a method for constructing an upper bound for explo...
research
01/23/2013

Model-Based Bayesian Exploration

Reinforcement learning systems are often concerned with balancing explor...
research
06/12/2023

Diverse Projection Ensembles for Distributional Reinforcement Learning

In contrast to classical reinforcement learning, distributional reinforc...
research
10/27/2020

Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via Latent Model Ensembles

Learning complex behaviors through interaction requires coordinated long...
research
05/27/2020

Assumed Density Filtering Q-learning

While off-policy temporal difference (TD) methods have widely been used ...
research
12/09/2017

Bayesian Q-learning with Assumed Density Filtering

While off-policy temporal difference methods have been broadly used in r...

Please sign up or login with your details

Forgot password? Click here to reset