Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners

04/16/2022
by   Shashank Gupta, et al.
2

Traditional multi-task learning (MTL) methods use dense networks that use the same set of shared weights across several different tasks. This often creates interference where two or more tasks compete to pull model parameters in different directions. In this work, we study whether sparsely activated Mixture-of-Experts (MoE) improve multi-task learning by specializing some weights for learning shared representations and using the others for learning task-specific information. To this end, we devise task-aware gating functions to route examples from different tasks to specialized experts which share subsets of network weights conditioned on the task. This results in a sparsely activated multi-task model with a large number of parameters, but with the same computational cost as that of a dense model. We demonstrate such sparse networks to improve multi-task learning along three key dimensions: (i) transfer to low-resource tasks from related tasks in the training mixture; (ii) sample-efficient generalization to tasks not seen during training by making use of task-aware routing from seen related tasks; (iii) robustness to the addition of unrelated tasks by avoiding catastrophic forgetting of existing tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2023

Sample-Level Weighting for Multi-Task Learning with Auxiliary Tasks

Multi-task learning (MTL) can improve the generalization performance of ...
research
12/15/2022

Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners

Optimization in multi-task learning (MTL) is more challenging than singl...
research
05/25/2022

Eliciting Transferability in Multi-task Learning with Task-level Mixture-of-Experts

Recent work suggests that transformer models are capable of multi-task l...
research
07/24/2020

Reparameterizing Convolutions for Incremental Multi-Task Learning without Task Interference

Multi-task networks are commonly utilized to alleviate the need for a la...
research
02/13/2023

SubTuning: Efficient Finetuning for Multi-Task Learning

Finetuning a pretrained model has become a standard approach for trainin...
research
07/17/2018

A Modulation Module for Multi-task Learning with Applications in Image Retrieval

Multi-task learning has been widely adopted in many computer vision task...
research
12/20/2022

RepMode: Learning to Re-parameterize Diverse Experts for Subcellular Structure Prediction

In subcellular biological research, fluorescence staining is a key techn...

Please sign up or login with your details

Forgot password? Click here to reset