Exploration via Epistemic Value Estimation

03/07/2023
by   Simon Schmitt, et al.
0

How to efficiently explore in reinforcement learning is an open problem. Many exploration algorithms employ the epistemic uncertainty of their own value predictions – for instance to compute an exploration bonus or upper confidence bound. Unfortunately the required uncertainty is difficult to estimate in general with function approximation. We propose epistemic value estimation (EVE): a recipe that is compatible with sequential decision making and with neural network function approximators. It equips agents with a tractable posterior over all their parameters from which epistemic value uncertainty can be computed efficiently. We use the recipe to derive an epistemic Q-Learning agent and observe competitive performance on a series of benchmarks. Experiments confirm that the EVE recipe facilitates efficient exploration in hard exploration tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2020

Langevin DQN

Algorithms that tackle deep exploration – an important challenge in rein...
research
10/05/2022

Query The Agent: Improving sample efficiency through epistemic uncertainty estimation

Curricula for goal-conditioned reinforcement learning agents typically r...
research
06/09/2021

Information Avoidance and Overvaluation in Sequential Decision Making under Epistemic Constraints

Decision makers involved in the management of civil assets and systems u...
research
06/05/2018

Boredom-driven curious learning by Homeo-Heterostatic Value Gradients

This paper presents the Homeo-Heterostatic Value Gradients (HHVG) algori...
research
03/12/2021

Optimal sequential decision making with probabilistic digital twins

Digital twins are emerging in many industries, typically consisting of s...
research
08/22/2022

Some Supervision Required: Incorporating Oracle Policies in Reinforcement Learning via Epistemic Uncertainty Metrics

An inherent problem in reinforcement learning is coping with policies th...
research
06/15/2020

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Model-based reinforcement learning algorithms with probabilistic dynamic...

Please sign up or login with your details

Forgot password? Click here to reset