Long-Term Exploration in Persistent MDPs

09/21/2021
by   Leonid Ugadiarov, et al.
0

Exploration is an essential part of reinforcement learning, which restricts the quality of learned policy. Hard-exploration environments are defined by huge state space and sparse rewards. In such conditions, an exhaustive exploration of the environment is often impossible, and the successful training of an agent requires a lot of interaction steps. In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process, in which agents during training can roll back to visited states. We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge. At all used levels of the game, our agent outperforms or shows comparable results with state-of-the-art curiosity methods with knowledge-based intrinsic motivation: ICM and RND. An implementation of RbExplore can be found at https://github.com/cds-mipt/RbExplore.

READ FULL TEXT

page 9

page 10

research
05/20/2021

Don't Do What Doesn't Matter: Intrinsic Motivation with Action Usefulness

Sparse rewards are double-edged training signals in reinforcement learni...
research
08/31/2022

Cell-Free Latent Go-Explore

In this paper, we introduce Latent Go-Explore (LGE), a simple and genera...
research
12/26/2020

Locally Persistent Exploration in Continuous Control Tasks with Sparse Rewards

A major challenge in reinforcement learning is the design of exploration...
research
08/01/2018

Robbins-Mobro conditions for persistent exploration learning strategies

We formulate simple assumptions, implying the Robbins-Monro conditions f...
research
01/20/2021

Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments

Exploration under sparse reward is a long-standing challenge of model-fr...
research
09/05/2017

Active Exploration for Learning Symbolic Representations

We introduce an online active exploration algorithm for data-efficiently...
research
12/08/2018

Learning Montezuma's Revenge from a Single Demonstration

We propose a new method for learning from a single demonstration to solv...

Please sign up or login with your details

Forgot password? Click here to reset