Scoring-Aggregating-Planning: Learning task-agnostic priors from interactions and sparse rewards for zero-shot generalization

10/17/2019
by   Huazhe Xu, et al.
8

Humans can learn task-agnostic priors from interactive experience and utilize the priors for novel tasks without any finetuning. In this paper, we propose Scoring-Aggregating-Planning (SAP), a framework that can learn task-agnostic semantics and dynamics priors from arbitrary quality interactions under sparse reward and then plan on unseen tasks in zero-shot condition. The framework finds a neural score function for local regional state and action pairs that can be aggregated to approximate the quality of a full trajectory; moreover, a dynamics model that is learned with self-supervision can be incorporated for planning. Many previous works that leverage interactive data for policy learning either need massive on-policy environmental interactions or assume access to expert data while we can achieve a similar goal with pure off-policy imperfect data. Instantiating our framework results in a generalizable policy to unseen tasks. Experiments demonstrate that the proposed method can outperform baseline methods on a wide range of applications including gridworld, robotics tasks, and video games.

READ FULL TEXT

page 2

page 10

page 17

page 18

page 19

research
10/01/2022

Zero-Shot Policy Transfer with Disentangled Task Representation of Meta-Reinforcement Learning

Humans are capable of abstracting various tasks as different combination...
research
04/23/2018

Zero-Shot Visual Imitation

The current dominant paradigm for imitation learning relies on strong su...
research
06/17/2021

SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies

Generalization has been a long-standing challenge for reinforcement lear...
research
03/31/2021

Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human Videos

We are motivated by the goal of generalist robots that can complete a wi...
research
06/05/2023

Explore to Generalize in Zero-Shot RL

We study zero-shot generalization in reinforcement learning - optimizing...
research
02/27/2020

Hallucinative Topological Memory for Zero-Shot Visual Planning

In visual planning (VP), an agent learns to plan goal-directed behavior ...
research
05/13/2019

Task-Agnostic Dynamics Priors for Deep Reinforcement Learning

While model-based deep reinforcement learning (RL) holds great promise f...

Please sign up or login with your details

Forgot password? Click here to reset