Log In Sign Up

#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

by   Haoran Tang, et al.

Count-based exploration algorithms are known to perform near-optimally when used in conjunction with tabular reinforcement learning (RL) methods for solving small discrete Markov decision processes (MDPs). It is generally thought that count-based methods cannot be applied in high-dimensional state spaces, since most states will only occur once. Recent deep RL exploration strategies are able to deal with high-dimensional continuous state spaces through complex heuristics, often relying on optimism in the face of uncertainty or intrinsic motivation. In this work, we describe a surprising finding: a simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep RL benchmarks. States are mapped to hash codes, which allows to count their occurrences with a hash table. These counts are then used to compute a reward bonus according to the classic count-based exploration theory. We find that simple hash functions can achieve surprisingly good results on many challenging tasks. Furthermore, we show that a domain-dependent learned hash code may further improve these results. Detailed analysis reveals important aspects of a good hash function: 1) having appropriate granularity and 2) encoding information relevant to solving the MDP. This exploration strategy achieves near state-of-the-art performance on both continuous control tasks and Atari 2600 games, hence providing a simple yet powerful baseline for solving MDPs that require considerable exploration.


page 5

page 15

page 16


Exploration in Feature Space for Reinforcement Learning

The infamous exploration-exploitation dilemma is one of the oldest and m...

Count-Based Exploration in Feature Space for Reinforcement Learning

We introduce a new count-based optimistic exploration algorithm for Rein...

VIME: Variational Information Maximizing Exploration

Scalable and effective exploration remains a key challenge in reinforcem...

LECO: Learnable Episodic Count for Task-Specific Intrinsic Reward

Episodic count has been widely used to design a simple yet effective int...

Count-Based Exploration with the Successor Representation

The problem of exploration in reinforcement learning is well-understood ...

A Max-Min Entropy Framework for Reinforcement Learning

In this paper, we propose a max-min entropy framework for reinforcement ...

Hashing Over Predicted Future Frames for Informed Exploration of Deep Reinforcement Learning

In reinforcement learning (RL) tasks, an efficient exploration mechanism...