Cyclophobic Reinforcement Learning

08/30/2023
by   Stefan Sylvius Wagner, et al.
0

In environments with sparse rewards, finding a good inductive bias for exploration is crucial to the agent's success. However, there are two competing goals: novelty search and systematic exploration. While existing approaches such as curiosity-driven exploration find novelty, they sometimes do not systematically explore the whole state space, akin to depth-first-search vs breadth-first-search. In this paper, we propose a new intrinsic reward that is cyclophobic, i.e., it does not reward novelty, but punishes redundancy by avoiding cycles. Augmenting the cyclophobic intrinsic reward with a sequence of hierarchical representations based on the agent's cropped observations we are able to achieve excellent results in the MiniGrid and MiniHack environments. Both are particularly hard, as they require complex interactions with different objects in order to be solved. Detailed comparisons with previous approaches and thorough ablation studies show that our newly proposed cyclophobic reinforcement learning is more sample efficient than other state of the art methods in a variety of tasks.

READ FULL TEXT

page 10

page 15

page 16

research
02/27/2020

RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments

Exploration in sparse reward environments remains one of the key challen...
research
10/11/2022

LECO: Learnable Episodic Count for Task-Specific Intrinsic Reward

Episodic count has been widely used to design a simple yet effective int...
research
10/21/2019

Exploration via Sample-Efficient Subgoal Design

The problem of exploration in unknown environments continues to pose a c...
research
02/05/2021

Sparse Reward Exploration via Novelty Search and Emitters

Reward-based optimization algorithms require both exploration, to find r...
research
11/18/2022

Curiosity in hindsight

Consider the exploration in sparse-reward or reward-free environments, s...
research
01/13/2023

Time-Myopic Go-Explore: Learning A State Representation for the Go-Explore Paradigm

Very large state spaces with a sparse reward signal are difficult to exp...
research
07/19/2021

Multimodal Reward Shaping for Efficient Exploration in Reinforcement Learning

Maintaining long-term exploration ability remains one of the challenges ...

Please sign up or login with your details

Forgot password? Click here to reset