DeMT: Deformable Mixer Transformer for Multi-Task Learning of Dense Prediction

01/09/2023
by   Yangyang Xu, et al.
0

Convolution neural networks (CNNs) and Transformers have their own advantages and both have been widely used for dense prediction in multi-task learning (MTL). Most of the current studies on MTL solely rely on CNN or Transformer. In this work, we present a novel MTL model by combining both merits of deformable CNN and query-based Transformer for multi-task learning of dense prediction. Our method, named DeMT, is based on a simple and effective encoder-decoder architecture (i.e., deformable mixer encoder and task-aware transformer decoder). First, the deformable mixer encoder contains two types of operators: the channel-aware mixing operator leveraged to allow communication among different channels (i.e., efficient channel location mixing), and the spatial-aware deformable operator with deformable convolution applied to efficiently sample more informative spatial locations (i.e., deformed features). Second, the task-aware transformer decoder consists of the task interaction block and task query block. The former is applied to capture task interaction features via self-attention. The latter leverages the deformed features and task-interacted features to generate the corresponding task-specific feature through a query-based Transformer for corresponding task predictions. Extensive experiments on two dense image prediction datasets, NYUD-v2 and PASCAL-Context, demonstrate that our model uses fewer GFLOPs and significantly outperforms current Transformer- and CNN-based competitive models on a variety of metrics. The code are available at https://github.com/yangyangxu0/DeMT .

READ FULL TEXT

page 3

page 7

research
08/10/2023

Deformable Mixer Transformer with Gating for Multi-Task Learning of Dense Prediction

CNNs and Transformers have their own advantages and both have been widel...
research
05/11/2022

An Empirical Study Of Self-supervised Learning Approaches For Object Detection With Transformers

Self-supervised learning (SSL) methods such as masked language modeling ...
research
07/28/2023

Prompt Guided Transformer for Multi-Task Dense Prediction

Task-conditional architecture offers advantage in parameter efficiency b...
research
03/15/2022

Inverted Pyramid Multi-task Transformer for Dense Scene Understanding

Multi-task dense scene understanding is a thriving research domain that ...
research
07/16/2023

TransNuSeg: A Lightweight Multi-Task Transformer for Nuclei Segmentation

Nuclei appear small in size, yet, in real clinical practice, the global ...
research
04/11/2019

An Empirical Study of Spatial Attention Mechanisms in Deep Networks

Attention mechanisms have become a popular component in deep neural netw...
research
03/24/2023

Efficient Mixed-Type Wafer Defect Pattern Recognition Using Compact Deformable Convolutional Transformers

Manufacturing wafers is an intricate task involving thousands of steps. ...

Please sign up or login with your details

Forgot password? Click here to reset