Optimistic Exploration even with a Pessimistic Initialisation

02/26/2020
by   Tabish Rashid, et al.
5

Optimistic initialisation is an effective strategy for efficient exploration in reinforcement learning (RL). In the tabular case, all provably efficient model-free algorithms rely on it. However, model-free deep RL algorithms do not use optimistic initialisation despite taking inspiration from these provably efficient tabular algorithms. In particular, in scenarios with only positive rewards, Q-values are initialised at their lowest possible values due to commonly used network initialisation schemes, a pessimistic initialisation. Merely initialising the network to output optimistic Q-values is not enough, since we cannot ensure that they remain optimistic for novel state-action pairs, which is crucial for exploration. We propose a simple count-based augmentation to pessimistically initialised Q-values that separates the source of optimism from the neural network. We show that this scheme is provably efficient in the tabular setting and extend it to the deep RL setting. Our algorithm, Optimistic Pessimistically Initialised Q-Learning (OPIQ), augments the Q-value estimates of a DQN-based agent with count-derived bonuses to ensure optimism during both action selection and bootstrapping. We show that OPIQ outperforms non-optimistic DQN variants that utilise a pseudocount-based intrinsic motivation in hard exploration tasks, and that it predicts optimistic estimates for novel state-action pairs.

READ FULL TEXT

page 9

page 20

research
04/11/2018

DORA The Explorer: Directed Outreaching Reinforcement Action-Selection

Exploration is a fundamental aspect of Reinforcement Learning, typically...
research
07/26/2019

On Hard Exploration for Reinforcement Learning: a Case Study in Pommerman

How to best explore in domains with sparse, delayed, and deceptive rewar...
research
06/18/2021

MADE: Exploration via Maximizing Deviation from Explored Regions

In online reinforcement learning (RL), efficient exploration remains par...
research
04/10/2019

Safer Deep RL with Shallow MCTS: A Case Study in Pommerman

Safe reinforcement learning has many variants and it is still an open re...
research
08/29/2018

Approximate Exploration through State Abstraction

Although exploration in reinforcement learning is well understood from a...
research
06/05/2023

Flipping Coins to Estimate Pseudocounts for Exploration in Reinforcement Learning

We propose a new method for count-based exploration in high-dimensional ...

Please sign up or login with your details

Forgot password? Click here to reset