The Uncertainty Bellman Equation and Exploration

09/15/2017
by   Brendan O'Donoghue, et al.
0

We consider the exploration/exploitation problem in reinforcement learning. For exploitation, it is well known that the Bellman equation connects the value at any time-step to the expected value at subsequent time-steps. In this paper we consider a similar uncertainty Bellman equation (UBE), which connects the uncertainty at any time-step to the expected uncertainties at subsequent time-steps, thereby extending the potential exploratory benefit of a policy beyond individual time-steps. We prove that the unique fixed point of the UBE yields an upper bound on the variance of the estimated value of any fixed policy. This bound can be much tighter than traditional count-based bonuses that compound standard deviation rather than variance. Importantly, and unlike several existing approaches to optimism, this method scales naturally to large systems with complex generalization. Substituting our UBE-exploration strategy for ϵ-greedy improves DQN performance on 51 out of 57 games in the Atari suite.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2023

Model-Based Uncertainty in Value Functions

We consider the problem of quantifying uncertainty over expected cumulat...
research
01/23/2013

Model-Based Bayesian Exploration

Reinforcement learning systems are often concerned with balancing explor...
research
11/17/2020

Leveraging the Variance of Return Sequences for Exploration Policy

This paper introduces a method for constructing an upper bound for explo...
research
07/25/2018

Variational Bayesian Reinforcement Learning with Regret Bounds

We consider the exploration-exploitation trade-off in reinforcement lear...
research
05/31/2023

Representation-Driven Reinforcement Learning

We present a representation-driven framework for reinforcement learning....
research
10/18/2019

Autonomous exploration for navigating in non-stationary CMPs

We consider a setting in which the objective is to learn to navigate in ...
research
07/31/2018

Incentives and Coordination in Bottleneck Models

We study a variant of Vickrey's classic bottleneck model. In our model t...

Please sign up or login with your details

Forgot password? Click here to reset