Never Give Up: Learning Directed Exploration Strategies

We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies. We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies, thereby encouraging the agent to repeatedly revisit all states in its environment. A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control. We employ the framework of Universal Value Function Approximators (UVFA) to simultaneously learn many directed exploration policies with the same neural network, with different trade-offs between exploration and exploitation. By using the same neural network for different degrees of exploration/exploitation, transfer is demonstrated from predominantly exploratory policies yielding effective exploitative policies. The proposed method can be incorporated to run with modern distributed RL agents that collect large amounts of experience from many actors running in parallel on separate environment instances. Our method doubles the performance of the base agent in all hard exploration in the Atari-57 suite while maintaining a very high score across the remaining games, obtaining a median human normalised score of 1344.0 achieve non-zero rewards (with a mean score of 8,400) in the game of Pitfall! without using demonstrations or hand-crafted features.

READ FULL TEXT
research
04/21/2023

DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards

Exploration is a fundamental aspect of reinforcement learning (RL), and ...
research
05/29/2018

Playing hard exploration games by watching YouTube

Deep reinforcement learning methods traditionally struggle with tasks wh...
research
03/02/2022

Follow your Nose: Using General Value Functions for Directed Exploration in Reinforcement Learning

Exploration versus exploitation dilemma is a significant problem in rein...
research
04/09/2021

Inverse Reinforcement Learning a Control Lyapunov Approach

Inferring the intent of an intelligent agent from demonstrations and sub...
research
05/30/2022

SEREN: Knowing When to Explore and When to Exploit

Efficient reinforcement learning (RL) involves a trade-off between "expl...
research
12/08/2018

Learning Montezuma's Revenge from a Single Demonstration

We propose a new method for learning from a single demonstration to solv...
research
06/05/2023

A Study of Global and Episodic Bonuses for Exploration in Contextual MDPs

Exploration in environments which differ across episodes has received in...

Please sign up or login with your details

Forgot password? Click here to reset