Some Supervision Required: Incorporating Oracle Policies in Reinforcement Learning via Epistemic Uncertainty Metrics

08/22/2022
by   Jun Jet Tai, et al.
0

An inherent problem in reinforcement learning is coping with policies that are uncertain about what action to take (or the value of a state). Model uncertainty, more formally known as epistemic uncertainty, refers to the expected prediction error of a model beyond the sampling noise. In this paper, we propose a metric for epistemic uncertainty estimation in Q-value functions, which we term pathwise epistemic uncertainty. We further develop a method to compute its approximate upper bound, which we call F -value. We experimentally apply the latter to Deep Q-Networks (DQN) and show that uncertainty estimation in reinforcement learning serves as a useful indication of learning progress. We then propose a new approach to improving sample efficiency in actor-critic algorithms by learning from an existing (previously learned or hard-coded) oracle policy while uncertainty is high, aiming to avoid unproductive random actions during training. We term this Critic Confidence Guided Exploration (CCGE). We implement CCGE on Soft Actor-Critic (SAC) using our F-value metric, which we apply to a handful of popular Gym environments and show that it achieves better sample efficiency and total episodic reward than vanilla SAC in limited contexts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/04/2023

Wasserstein Actor-Critic: Directed Exploration via Optimism for Continuous-Actions Control

Uncertainty quantification has been extensively used as a means to achie...
research
10/28/2019

Better Exploration with Optimistic Actor-Critic

Actor-critic methods, a type of model-free Reinforcement Learning, have ...
research
03/07/2023

Exploration via Epistemic Value Estimation

How to efficiently explore in reinforcement learning is an open problem....
research
02/18/2023

Efficient exploration via epistemic-risk-seeking policy optimization

Exploration remains a key challenge in deep reinforcement learning (RL)....
research
07/27/2022

Distributional Actor-Critic Ensemble for Uncertainty-Aware Continuous Control

Uncertainty quantification is one of the central challenges for machine ...
research
10/07/2021

Learning Pessimism for Robust and Efficient Off-Policy Reinforcement Learning

Popular off-policy deep reinforcement learning algorithms compensate for...
research
10/05/2022

Query The Agent: Improving sample efficiency through epistemic uncertainty estimation

Curricula for goal-conditioned reinforcement learning agents typically r...

Please sign up or login with your details

Forgot password? Click here to reset