Efficient exploration via epistemic-risk-seeking policy optimization

02/18/2023
by   Brendan O'Donoghue, et al.
0

Exploration remains a key challenge in deep reinforcement learning (RL). Optimism in the face of uncertainty is a well-known heuristic with theoretical guarantees in the tabular setting, but how best to translate the principle to deep reinforcement learning, which involves online stochastic gradients and deep network function approximators, is not fully understood. In this paper we propose a new, differentiable optimistic objective that when optimized yields a policy that provably explores efficiently, with guarantees even under function approximation. Our new objective is a zero-sum two-player game derived from endowing the agent with an epistemic-risk-seeking utility function, which converts uncertainty into value and encourages the agent to explore uncertain states. We show that the solution to this game minimizes an upper bound on the regret, with the `players' each attempting to minimize one component of a particular regret decomposition. We derive a new model-free algorithm which we call `epistemic-risk-seeking actor-critic', which is simply an application of simultaneous stochastic gradient ascent-descent to the game. We conclude with some results showing good performance of a deep RL agent using the technique on the challenging `DeepSea' environment, showing significant performance improvements even over other efficient exploration techniques, as well as results on the Atari benchmark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/05/2022

Adversarially Trained Actor Critic for Offline Reinforcement Learning

We propose Adversarially Trained Actor Critic (ATAC), a new model-free a...
research
06/22/2020

Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret

We study risk-sensitive reinforcement learning in episodic Markov decisi...
research
08/22/2022

Some Supervision Required: Incorporating Oracle Policies in Reinforcement Learning via Epistemic Uncertainty Metrics

An inherent problem in reinforcement learning is coping with policies th...
research
07/25/2018

Variational Bayesian Reinforcement Learning with Regret Bounds

We consider the exploration-exploitation trade-off in reinforcement lear...
research
07/27/2022

Distributional Actor-Critic Ensemble for Uncertainty-Aware Continuous Control

Uncertainty quantification is one of the central challenges for machine ...
research
08/27/2020

The Advantage Regret-Matching Actor-Critic

Regret minimization has played a key role in online learning, equilibriu...
research
09/22/2020

Is Q-Learning Provably Efficient? An Extended Analysis

This work extends the analysis of the theoretical results presented with...

Please sign up or login with your details

Forgot password? Click here to reset