Successor Uncertainties: exploration and uncertainty in temporal difference learning

10/15/2018

∙

We consider the problem of balancing exploration and exploitation in sequential decision making problems. To explore efficiently, it is vital to consider the uncertainty over all consequences of a decision, and not just those that follow immediately; the uncertainties involved need to be propagated according to the dynamics of the problem. To this end, we develop Successor Uncertainties, a probabilistic model for the state-action value function of a Markov Decision Process that propagates uncertainties in a coherent and scalable way. We relate our approach to other classical and contemporary methods for exploration and present an empirical analysis.

READ FULL TEXT

Successor Uncertainties: exploration and uncertainty in temporal difference learning

Sign in with Google

Consider DeepAI Pro