Continual Task Allocation in Meta-Policy Network via Sparse Prompting

05/29/2023
by   Yijun Yang, et al.
0

How to train a generalizable meta-policy by continually learning a sequence of tasks? It is a natural human skill yet challenging to achieve by current reinforcement learning: the agent is expected to quickly adapt to new tasks (plasticity) meanwhile retaining the common knowledge from previous tasks (stability). We address it by "Continual Task Allocation via Sparse Prompting (CoTASP)", which learns over-complete dictionaries to produce sparse masks as prompts extracting a sub-network for each task from a meta-policy network. By optimizing the sub-network and prompts alternatively, CoTASP updates the meta-policy via training a task-specific policy. The dictionary is then updated to align the optimized prompts with tasks' embedding, thereby capturing their semantic correlations. Hence, relevant tasks share more neurons in the meta-policy network via similar prompts while cross-task interference causing forgetting is effectively restrained. Given a trained meta-policy with updated dictionaries, new task adaptation reduces to highly efficient sparse prompting and sub-network finetuning. In experiments, CoTASP achieves a promising plasticity-stability trade-off without storing or replaying any past tasks' experiences and outperforms existing continual and multi-task RL methods on all seen tasks, forgetting reduction, and generalization to unseen tasks.

READ FULL TEXT

page 1

page 8

page 12

page 16

research
12/08/2021

CoMPS: Continual Meta Policy Search

We develop a new continual meta-learning method to address challenges in...
research
02/01/2019

Policy Consolidation for Continual Reinforcement Learning

We propose a method for tackling catastrophic forgetting in deep reinfor...
research
02/11/2020

Hyper-Meta Reinforcement Learning with Sparse Reward

Despite their success, existing meta reinforcement learning methods stil...
research
03/12/2023

Predictive Experience Replay for Continual Visual Control and Forecasting

Learning physical dynamics in a series of non-stationary environments is...
research
06/05/2021

Same State, Different Task: Continual Reinforcement Learning without Interference

Continual Learning (CL) considers the problem of training an agent seque...
research
10/21/2022

Continual Reinforcement Learning with Group Symmetries

Continual reinforcement learning (RL) aims to learn a sequence of tasks ...
research
10/15/2021

Towards Better Plasticity-Stability Trade-off in Incremental Learning: A simple Linear Connector

Plasticity-stability dilemma is a main problem for incremental learning,...

Please sign up or login with your details

Forgot password? Click here to reset