Policy Transfer via Enhanced Action Space

12/07/2022
by   Zheng Zhang, et al.
0

Though transfer learning is promising to increase the learning efficiency, the existing methods are still subject to the challenges from long-horizon tasks, especially when expert policies are sub-optimal and partially useful. Hence, a novel algorithm named EASpace (Enhanced Action Space) is proposed in this paper to transfer the knowledge of multiple sub-optimal expert policies. EASpace formulates each expert policy into multiple macro actions with different execution time period, then integrates all macro actions into the primitive action space directly. Through this formulation, the proposed EASpace could learn when to execute which expert policy and how long it lasts. An intra-macro-action learning rule is proposed by adjusting the temporal difference target of macro actions to improve the data efficiency and alleviate the non-stationarity issue in multi-agent settings. Furthermore, an additional reward proportional to the execution time of macro actions is introduced to encourage the environment exploration via macro actions, which is significant to learn a long-horizon task. Theoretical analysis is presented to show the convergence of the proposed algorithm. The efficiency of the proposed algorithm is illustrated by a grid-based game and a multi-agent pursuit problem. The proposed algorithm is also implemented to real physical systems to justify its effectiveness.

READ FULL TEXT

page 5

page 8

page 10

page 13

research
08/05/2019

Construction of Macro Actions for Deep Reinforcement Learning

Conventional deep reinforcement learning typically determines an appropr...
research
04/18/2020

Macro-Action-Based Deep Multi-Agent Reinforcement Learning

In real-world multi-robot systems, performing high-quality, collaborativ...
research
09/20/2022

Macro-Action-Based Multi-Agent/Robot Deep Reinforcement Learning under Partial Observability

The state-of-the-art multi-agent reinforcement learning (MARL) methods h...
research
11/07/2020

MAGIC: Learning Macro-Actions for Online POMDP Planning using Generator-Critic

When robots operate in the real-world, they need to handle uncertainties...
research
12/31/2020

Robust Asymmetric Learning in POMDPs

Policies for partially observed Markov decision processes can be efficie...
research
02/22/2020

Nonmyopic Gaussian Process Optimization with Macro-Actions

This paper presents a multi-staged approach to nonmyopic adaptive Gaussi...
research
06/18/2021

Learning to Plan via a Multi-Step Policy Regression Method

We propose a new approach to increase inference performance in environme...

Please sign up or login with your details

Forgot password? Click here to reset