Locally Persistent Exploration in Continuous Control Tasks with Sparse Rewards

12/26/2020
by   Susan Amin, et al.
3

A major challenge in reinforcement learning is the design of exploration strategies, especially for environments with sparse reward structures and continuous state and action spaces. Intuitively, if the reinforcement signal is very scarce, the agent should rely on some form of short-term memory in order to cover its environment efficiently. We propose a new exploration method, based on two intuitions: (1) the choice of the next exploratory action should depend not only on the (Markovian) state of the environment, but also on the agent's trajectory so far, and (2) the agent should utilize a measure of spread in the state space to avoid getting stuck in a small region. Our method leverages concepts often used in statistical physics to provide explanations for the behavior of simplified (polymer) chains, in order to generate persistent (locally self-avoiding) trajectories in state space. We discuss the theoretical properties of locally self-avoiding walks, and their ability to provide a kind of short-term memory, through a decaying temporal correlation within the trajectory. We provide empirical evaluations of our approach in a simulated 2D navigation task, as well as higher-dimensional MuJoCo continuous control locomotion tasks with sparse rewards.

READ FULL TEXT

page 6

page 20

page 22

research
09/21/2021

Long-Term Exploration in Persistent MDPs

Exploration is an essential part of reinforcement learning, which restri...
research
03/02/2022

Follow your Nose: Using General Value Functions for Directed Exploration in Reinforcement Learning

Exploration versus exploitation dilemma is a significant problem in rein...
research
10/21/2019

Exploration via Sample-Efficient Subgoal Design

The problem of exploration in unknown environments continues to pose a c...
research
01/01/2020

Long-Term Visitation Value for Deep Exploration in Sparse Reward Reinforcement Learning

Reinforcement learning with sparse rewards is still an open challenge. C...
research
09/04/2019

Learning sparse representations in reinforcement learning

Reinforcement learning (RL) algorithms allow artificial agents to improv...
research
05/23/2022

Learning Long-Horizon Robot Exploration Strategies for Multi-Object Search in Continuous Action Spaces

Recent advances in vision-based navigation and exploration have shown im...
research
10/02/2018

EMI: Exploration with Mutual Information Maximizing State and Action Embeddings

Policy optimization struggles when the reward feedback signal is very sp...

Please sign up or login with your details

Forgot password? Click here to reset