Mapping State Space using Landmarks for Universal Goal Reaching

08/15/2019
by   Zhiao Huang, et al.
2

An agent that has well understood the environment should be able to apply its skills for any given goals, leading to the fundamental problem of learning the Universal Value Function Approximator (UVFA). A UVFA learns to predict the cumulative rewards between all state-goal pairs. However, empirically, the value function for long-range goals is always hard to estimate and may consequently result in failed policy. This has presented challenges to the learning process and the capability of neural networks. We propose a method to address this issue in large MDPs with sparse rewards, in which exploration and routing across remote states are both extremely challenging. Our method explicitly models the environment in a hierarchical manner, with a high-level dynamic landmark-based map abstracting the visited state space, and a low-level value network to derive precise local decisions. We use farthest point sampling to select landmark states from past experience, which has improved exploration compared with simple uniform sampling. Experimentally we showed that our method enables the agent to reach long-range goals at the early training stage, and achieve better performance than standard RL algorithms for a number of challenging tasks.

READ FULL TEXT

page 8

page 10

research
10/26/2021

Landmark-Guided Subgoal Generation in Hierarchical Reinforcement Learning

Goal-conditioned hierarchical reinforcement learning (HRL) has shown pro...
research
07/22/2023

HIQL: Offline Goal-Conditioned RL with Latent States as Actions

Unsupervised pre-training has recently become the bedrock for computer v...
research
05/18/2022

World Value Functions: Knowledge Representation for Multitask Reinforcement Learning

An open problem in artificial intelligence is how to learn and represent...
research
01/18/2021

Learning Successor States and Goal-Dependent Values: A Mathematical Viewpoint

In reinforcement learning, temporal difference-based algorithms can be s...
research
06/08/2022

Deep Hierarchical Planning from Pixels

Intelligent agents need to select long sequences of actions to solve com...
research
05/21/2020

LEAF: Latent Exploration Along the Frontier

Self-supervised goal proposal and reaching is a key component for explor...
research
11/07/2022

C3PO: Learning to Achieve Arbitrary Goals via Massively Entropic Pretraining

Given a particular embodiment, we propose a novel method (C3PO) that lea...

Please sign up or login with your details

Forgot password? Click here to reset