An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training

06/29/2023
by   Zitian Chen, et al.
0

We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently. Despite considerable progress in multi-task learning, most efforts focus on learning from multi-label data: a single image set with multiple task labels. Such multi-label data sets are rare, small, and expensive. We say heterogeneous to refer to image sets with different task labels, or to combinations of single-task datasets. Few have explored training on such heterogeneous datasets. General-purpose vision models are still dominated by single-task pretraining, and it remains unclear how to scale up multi-task models by leveraging mainstream vision datasets designed for different purposes. The challenges lie in managing large intrinsic differences among vision tasks, including data distribution, architectures, task-specific modules, dataset scales, and sampling strategies. To address these challenges, we propose to modify and scale up mixture-of-experts (MoE) vision transformers, so that they can simultaneously learn classification, detection, and segmentation on diverse mainstream vision datasets including ImageNet, COCO, and ADE20K. Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks. Due to its emergent modularity, this general-purpose model decomposes into high-performing components, efficiently adapting to downstream tasks. We can fine-tune it with fewer training parameters, fewer model parameters, and less computation. Additionally, its modularity allows for easy expansion in continual-learning-without-forgetting scenarios. Finally, these functions can be controlled and combined to meet various demands of downstream tasks.

READ FULL TEXT
research
10/07/2022

Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks

Adapting large-scale pretrained models to various downstream tasks via f...
research
02/14/2018

Disjoint Multi-task Learning between Heterogeneous Human-centric Tasks

Human behavior understanding is arguably one of the most important mid-l...
research
03/28/2023

Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models

As general purpose vision models get increasingly effective at a wide se...
research
09/19/2022

Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving

Aiming towards a holistic understanding of multiple downstream tasks sim...
research
08/03/2022

GPPF: A General Perception Pre-training Framework via Sparsely Activated Multi-Task Learning

Pre-training over mixtured multi-task, multi-domain, and multi-modal dat...
research
04/20/2022

Residual Mixture of Experts

Mixture of Experts (MoE) is able to scale up vision transformers effecti...
research
05/31/2022

CropMix: Sampling a Rich Input Distribution via Multi-Scale Cropping

We present a simple method, CropMix, for the purpose of producing a rich...

Please sign up or login with your details

Forgot password? Click here to reset