Planning Immediate Landmarks of Targets for Model-Free Skill Transfer across Agents

12/18/2022
by   Minghuan Liu, et al.
0

In reinforcement learning applications like robotics, agents usually need to deal with various input/output features when specified with different state/action spaces by their developers or physical restrictions. This indicates unnecessary re-training from scratch and considerable sample inefficiency, especially when agents follow similar solution steps to achieve tasks. In this paper, we aim to transfer similar high-level goal-transition knowledge to alleviate the challenge. Specifically, we propose PILoT, i.e., Planning Immediate Landmarks of Targets. PILoT utilizes the universal decoupled policy optimization to learn a goal-conditioned state planner; then, distills a goal-planner to plan immediate landmarks in a model-free style that can be shared among different agents. In our experiments, we show the power of PILoT on various transferring challenges, including few-shot transferring across action spaces and dynamics, from low-dimensional vector states to image inputs, from simple robot to complicated morphology; and we also illustrate a zero-shot transfer solution from a simple 2D navigation task to the harder Ant-Maze task.

READ FULL TEXT

page 7

page 16

page 17

page 18

research
11/18/2021

Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning

Operating in the real-world often requires agents to learn about a compl...
research
03/04/2022

Plan Your Target and Learn Your Skills: Transferable State-Only Imitation Learning via Decoupled Policy Optimization

Recent progress in state-only imitation learning extends the scope of ap...
research
11/19/2019

Planning with Goal-Conditioned Policies

Planning methods can solve temporally extended sequential decision makin...
research
04/10/2020

Residual Policy Learning for Shared Autonomy

Shared autonomy provides an effective framework for human-robot collabor...
research
02/05/2022

Transfer Reinforcement Learning for Differing Action Spaces via Q-Network Representations

Transfer learning approaches in reinforcement learning aim to assist age...
research
06/06/2022

Goal-Space Planning with Subgoal Models

This paper investigates a new approach to model-based reinforcement lear...
research
07/06/2020

Jump Operator Planning: Goal-Conditioned Policy Ensembles and Zero-Shot Transfer

In Hierarchical Control, compositionality, abstraction, and task-transfe...

Please sign up or login with your details

Forgot password? Click here to reset