Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer

02/22/2021
by   Ronghang Hu, et al.
44

We propose UniT, a Unified Transformer model to simultaneously learn the most prominent tasks across different domains, ranging from object detection to language understanding and multimodal reasoning. Based on the transformer encoder-decoder architecture, our UniT model encodes each input modality with an encoder and makes predictions on each task with a shared decoder over the encoded input representations, followed by task-specific output heads. The entire model is jointly trained end-to-end with losses from each task. Compared to previous efforts on multi-task learning with transformers, we share the same model parameters to all tasks instead of separately fine-tuning task-specific models and handle a much higher variety of tasks across different domains. In our experiments, we learn 7 tasks jointly over 8 datasets, achieving comparable performance to well-established prior work on each domain under the same supervision with a compact set of model parameters. Code will be released in MMF at https://mmf.sh.

READ FULL TEXT

page 8

page 15

research
06/08/2021

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

State-of-the-art parameter-efficient fine-tuning methods rely on introdu...
research
11/17/2022

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

Despite the remarkable success of foundation models, their task-specific...
research
12/16/2021

KAT: A Knowledge Augmented Transformer for Vision-and-Language

The primary focus of recent work with largescale transformers has been o...
research
08/10/2023

Deformable Mixer Transformer with Gating for Multi-Task Learning of Dense Prediction

CNNs and Transformers have their own advantages and both have been widel...
research
06/21/2021

Spatio-Temporal Multi-Task Learning Transformer for Joint Moving Object Detection and Segmentation

Moving objects have special importance for Autonomous Driving tasks. Det...
research
09/11/2021

Empirical Analysis of Training Strategies of Transformer-based Japanese Chit-chat Systems

In recent years, several high-performance conversational systems have be...
research
10/24/2020

Multi-task Supervised Learning via Cross-learning

In this paper we consider a problem known as multi-task learning, consis...

Please sign up or login with your details

Forgot password? Click here to reset