Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs

06/09/2022
by   Jinguo Zhu, et al.
0

To build an artificial neural network like the biological intelligence system, recent works have unified numerous tasks into a generalist model, which can process various tasks with shared parameters and do not have any task-specific modules. While generalist models achieve promising results on various benchmarks, they have performance degradation on some tasks compared with task-specialized models. In this work, we find that interference among different tasks and modalities is the main factor to this phenomenon. To mitigate such interference, we introduce the Conditional Mixture-of-Experts (Conditional MoEs) to generalist models. Routing strategies under different levels of conditions are proposed to take both the training/inference cost and generalization ability into account. By incorporating the proposed Conditional MoEs, the recently proposed generalist model Uni-Perceiver can effectively mitigate the interference across tasks and modalities, and achieves state-of-the-art results on a series of downstream tasks via prompt tuning on 1 holds the generalization ability of generalist models to conduct zero-shot inference on new tasks, e.g., video-text retrieval and video caption. Code and pre-trained generalist models shall be released.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/02/2021

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks

Biological intelligence systems of animals perceive the world by integra...
research
09/15/2022

Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models

Pre-trained vision-language models (e.g., CLIP) have shown promising zer...
research
05/23/2023

TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale

The ultimate goal for foundation models is realizing task-agnostic, i.e....
research
10/05/2021

MoEfication: Conditional Computation of Transformer Models for Efficient Inference

Transformer-based pre-trained language models can achieve superior perfo...
research
05/13/2023

CodeT5+: Open Code Large Language Models for Code Understanding and Generation

Large language models (LLMs) pretrained on vast source code have achieve...
research
01/09/2023

Universal Information Extraction as Unified Semantic Matching

The challenge of information extraction (IE) lies in the diversity of la...
research
06/06/2023

Take the Hint: Improving Arabic Diacritization with Partially-Diacritized Text

Automatic Arabic diacritization is useful in many applications, ranging ...

Please sign up or login with your details

Forgot password? Click here to reset