Zero-Shot Policy Transfer with Disentangled Task Representation of Meta-Reinforcement Learning

10/01/2022
by   Zheng Wu, et al.
0

Humans are capable of abstracting various tasks as different combinations of multiple attributes. This perspective of compositionality is vital for human rapid learning and adaption since previous experiences from related tasks can be combined to generalize across novel compositional settings. In this work, we aim to achieve zero-shot policy generalization of Reinforcement Learning (RL) agents by leveraging the task compositionality. Our proposed method is a meta- RL algorithm with disentangled task representation, explicitly encoding different aspects of the tasks. Policy generalization is then performed by inferring unseen compositional task representations via the obtained disentanglement without extra exploration. The evaluation is conducted on three simulated tasks and a challenging real-world robotic insertion task. Experimental results demonstrate that our proposed method achieves policy generalization to unseen compositional tasks in a zero-shot manner.

READ FULL TEXT

page 1

page 4

page 6

research
08/22/2022

Reference-Limited Compositional Zero-Shot Learning

Compositional zero-shot learning (CZSL) refers to recognizing unseen com...
research
06/17/2021

SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies

Generalization has been a long-standing challenge for reinforcement lear...
research
11/28/2022

Hypernetworks for Zero-shot Transfer in Reinforcement Learning

In this paper, hypernetworks are trained to generate behaviors across a ...
research
06/01/2020

Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulas

We demonstrate a reinforcement learning agent which uses a compositional...
research
06/05/2023

Explore to Generalize in Zero-Shot RL

We study zero-shot generalization in reinforcement learning - optimizing...
research
03/13/2021

Error-Aware Policy Learning: Zero-Shot Generalization in Partially Observable Dynamic Environments

Simulation provides a safe and efficient way to generate useful data for...
research
10/17/2019

Scoring-Aggregating-Planning: Learning task-agnostic priors from interactions and sparse rewards for zero-shot generalization

Humans can learn task-agnostic priors from interactive experience and ut...

Please sign up or login with your details

Forgot password? Click here to reset