SkillNet-X: A Multilingual Multitask Model with Sparsely Activated Skills

06/28/2023
by   Zhangyin Feng, et al.
0

Traditional multitask learning methods basically can only exploit common knowledge in task- or language-wise, which lose either cross-language or cross-task knowledge. This paper proposes a general multilingual multitask model, named SkillNet-X, which enables a single model to tackle many different tasks from different languages. To this end, we define several language-specific skills and task-specific skills, each of which corresponds to a skill module. SkillNet-X sparsely activates parts of the skill modules which are relevant either to the target task or the target language. Acting as knowledge transit hubs, skill modules are capable of absorbing task-related knowledge and language-related knowledge consecutively. Based on Transformer, we modify the multi-head attention layer and the feed forward network layer to accommodate skill modules. We evaluate SkillNet-X on eleven natural language understanding datasets in four languages. Results show that SkillNet-X performs better than task-specific baselines and two multitask learning baselines (i.e., dense joint model and Mixture-of-Experts model). Furthermore, skill pre-training further improves the performance of SkillNet-X on almost all datasets. To investigate the generalization of our model, we conduct experiments on two new tasks and find that SkillNet-X significantly outperforms baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/07/2022

One Model, Multiple Tasks: Pathways for Natural Language Understanding

This paper presents a Pathways approach to handle many tasks at once. Ou...
research
04/26/2022

SkillNet-NLG: General-Purpose Natural Language Generation with a Sparsely Activated Approach

We present SkillNet-NLG, a sparsely activated approach that handles many...
research
02/28/2022

Combining Modular Skills in Multitask Learning

A modular design encourages neural models to disentangle and recombine d...
research
05/24/2023

Lightweight Learner for Shared Knowledge Lifelong Learning

In Lifelong Learning (LL), agents continually learn as they encounter ne...
research
01/18/2020

Developing and Validating an Interactive Training Tool for Inferring 2D Cross-Sections of Complex 3D Structures

Understanding 2D cross-sections of 3D structures is a crucial skill in m...
research
12/13/2018

The Pros and Cons: Rank-aware Temporal Attention for Skill Determination in Long Videos

We present a new model to determine relative skill from long videos, thr...
research
02/21/2020

Modelling Latent Skills for Multitask Language Generation

We present a generative model for multitask conditional language generat...

Please sign up or login with your details

Forgot password? Click here to reset