REPAINT: Knowledge Transfer in Deep Actor-Critic Reinforcement Learning

11/24/2020
by   Yunzhe Tao, et al.
12

Accelerating the learning processes for complex tasks by leveraging previously learned tasks has been one of the most challenging problems in reinforcement learning, especially when the similarity between source and target tasks is low or unknown. In this work, we propose a REPresentation-And-INstance Transfer algorithm (REPAINT) for deep actor-critic reinforcement learning paradigm. In representation transfer, we adopt a kickstarted training method using a pre-trained teacher policy by introducing an auxiliary cross-entropy loss. In instance transfer, we develop a sampling approach, i.e., advantage-based experience replay, on transitions collected following the teacher policy, where only the samples with high advantage estimates are retained for policy update. We consider both learning an unseen target task by transferring from previously learned teacher tasks and learning a partially unseen task composed of multiple sub-tasks by transferring from a pre-learned teacher sub-task. In several benchmark experiments, REPAINT significantly reduces the total training time and improves the asymptotic performance compared to training with no prior knowledge and other baselines.

READ FULL TEXT
research
06/21/2023

Introspective Action Advising for Interpretable Transfer Learning

Transfer learning can be applied in deep reinforcement learning to accel...
research
07/22/2018

Asynchronous Advantage Actor-Critic Agent for Starcraft II

Deep reinforcement learning, and especially the Asynchronous Advantage A...
research
01/23/2019

Distillation Strategies for Proximal Policy Optimization

Vision-based deep reinforcement learning (RL), similar to deep learning,...
research
08/14/2023

IOB: Integrating Optimization Transfer and Behavior Transfer for Multi-Policy Reuse

Humans have the ability to reuse previously learned policies to solve ne...
research
10/14/2019

Actor Critic with Differentially Private Critic

Reinforcement learning algorithms are known to be sample inefficient, an...
research
05/05/2023

Knowledge Transfer from Teachers to Learners in Growing-Batch Reinforcement Learning

Standard approaches to sequential decision-making exploit an agent's abi...
research
09/09/2019

AC-Teach: A Bayesian Actor-Critic Method for Policy Learning with an Ensemble of Suboptimal Teachers

The exploration mechanism used by a Deep Reinforcement Learning (RL) age...

Please sign up or login with your details

Forgot password? Click here to reset