Deep Curiosity Search: Intra-Life Exploration Improves Performance on Challenging Deep Reinforcement Learning Problems

06/01/2018
by   Christopher Stanton, et al.
0

Traditional exploration methods in RL require agents to perform random actions to find rewards. But these approaches struggle on sparse-reward domains like Montezuma's Revenge where the probability that any random action sequence leads to reward is extremely low. Recent algorithms have performed well on such tasks by encouraging agents to visit new states or perform new actions in relation to all prior training episodes (which we call across-training novelty). But such algorithms do not consider whether an agent exhibits intra-life novelty: doing something new within the current episode, regardless of whether those behaviors have been performed in previous episodes. We hypothesize that across-training novelty might discourage agents from revisiting initially non-rewarding states that could become important stepping stones later in training. We introduce Deep Curiosity Search (DeepCS), which encourages intra-life exploration by rewarding agents for visiting as many different states as possible within each episode, and show that DeepCS matches the performance of current state-of-the-art methods on Montezuma's Revenge. We further show that DeepCS improves exploration on Gravitar (another difficult, sparse-reward game) and performs well on the dense-reward game Amidar. Surprisingly, DeepCS doubles A2C performance on Seaquest, a game we would not have expected to benefit from intra-life exploration because the arena is small and already easily navigated by naive exploration techniques. In one run, DeepCS achieves a maximum training score of 80,000 points on Seaquest, higher than any methods other than Ape-X. The strong performance of DeepCS on these sparse- and dense-reward tasks suggests that encouraging intra-life novelty is an interesting, new approach for improving performance in Deep RL and motivates further research into hybridizing across-training and intra-life exploration methods.

READ FULL TEXT

page 4

page 6

page 7

page 11

page 13

research
06/19/2019

QXplore: Q-learning Exploration by Maximizing Temporal Difference Error

A major challenge in reinforcement learning for continuous state-action ...
research
06/06/2019

Clustered Reinforcement Learning

Exploration strategy design is one of the challenging problems in reinfo...
research
12/18/2017

Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents

Evolution strategies (ES) are a family of black-box optimization algorit...
research
03/02/2022

Learning in Sparse Rewards settings through Quality-Diversity algorithms

In the Reinforcement Learning (RL) framework, the learning is guided thr...
research
06/04/2021

Online reinforcement learning with sparse rewards through an active inference capsule

Intelligent agents must pursue their goals in complex environments with ...
research
01/24/2019

Never Forget: Balancing Exploration and Exploitation via Learning Optical Flow

Exploration bonus derived from the novelty of the states in an environme...
research
08/12/2020

REMAX: Relational Representation for Multi-Agent Exploration

Training a multi-agent reinforcement learning (MARL) model is generally ...

Please sign up or login with your details

Forgot password? Click here to reset