Deformable Mixer Transformer with Gating for Multi-Task Learning of Dense Prediction

08/10/2023
by   Yangyang Xu, et al.
0

CNNs and Transformers have their own advantages and both have been widely used for dense prediction in multi-task learning (MTL). Most of the current studies on MTL solely rely on CNN or Transformer. In this work, we present a novel MTL model by combining both merits of deformable CNN and query-based Transformer with shared gating for multi-task learning of dense prediction. This combination may offer a simple and efficient solution owing to its powerful and flexible task-specific learning and advantages of lower cost, less complexity and smaller parameters than the traditional MTL methods. We introduce deformable mixer Transformer with gating (DeMTG), a simple and effective encoder-decoder architecture up-to-date that incorporates the convolution and attention mechanism in a unified network for MTL. It is exquisitely designed to use advantages of each block, and provide deformable and comprehensive features for all tasks from local and global perspective. First, the deformable mixer encoder contains two types of operators: the channel-aware mixing operator leveraged to allow communication among different channels, and the spatial-aware deformable operator with deformable convolution applied to efficiently sample more informative spatial locations. Second, the task-aware gating transformer decoder is used to perform the task-specific predictions, in which task interaction block integrated with self-attention is applied to capture task interaction features, and the task query block integrated with gating attention is leveraged to select corresponding task-specific features. Further, the experiment results demonstrate that the proposed DeMTG uses fewer GFLOPs and significantly outperforms current Transformer-based and CNN-based competitive models on a variety of metrics on three dense prediction datasets. Our code and models are available at https://github.com/yangyangxu0/DeMTG.

READ FULL TEXT

page 5

page 15

research
01/09/2023

DeMT: Deformable Mixer Transformer for Multi-Task Learning of Dense Prediction

Convolution neural networks (CNNs) and Transformers have their own advan...
research
07/28/2023

Prompt Guided Transformer for Multi-Task Dense Prediction

Task-conditional architecture offers advantage in parameter efficiency b...
research
01/03/2022

Vision Transformer with Deformable Attention

Transformers have recently shown superior performances on various vision...
research
05/11/2022

An Empirical Study Of Self-supervised Learning Approaches For Object Detection With Transformers

Self-supervised learning (SSL) methods such as masked language modeling ...
research
02/22/2021

Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer

We propose UniT, a Unified Transformer model to simultaneously learn the...
research
03/15/2022

Inverted Pyramid Multi-task Transformer for Dense Scene Understanding

Multi-task dense scene understanding is a thriving research domain that ...
research
07/16/2023

TransNuSeg: A Lightweight Multi-Task Transformer for Nuclei Segmentation

Nuclei appear small in size, yet, in real clinical practice, the global ...

Please sign up or login with your details

Forgot password? Click here to reset