Miti-DETR: Object Detection based on Transformers with Mitigatory Self-Attention Convergence

12/26/2021
by   Wenchi Ma, et al.
10

Object Detection with Transformers (DETR) and related works reach or even surpass the highly-optimized Faster-RCNN baseline with self-attention network architectures. Inspired by the evidence that pure self-attention possesses a strong inductive bias that leads to the transformer losing the expressive power with respect to network depth, we propose a transformer architecture with a mitigatory self-attention mechanism by applying possible direct mapping connections in the transformer architecture to mitigate the rank collapse so as to counteract feature expression loss and enhance the model performance. We apply this proposal in object detection tasks and develop a model named Miti-DETR. Miti-DETR reserves the inputs of each single attention layer to the outputs of that layer so that the "non-attention" information has participated in any attention propagation. The formed residual self-attention network addresses two critical issues: (1) stop the self-attention networks from degenerating to rank-1 to the maximized degree; and (2) further diversify the path distribution of parameter update so that easier attention learning is expected. Miti-DETR significantly enhances the average detection precision and convergence speed towards existing DETR-based models on the challenging COCO object detection dataset. Moreover, the proposed transformer with the residual self-attention network can be easily generalized or plugged in other related task models without specific customization.

READ FULL TEXT

page 2

page 6

research
05/02/2018

Accelerating Neural Transformer via an Average Attention Network

With parallelizable attention networks, the neural Transformer is very f...
research
10/08/2022

Towards Light Weight Object Detection System

Transformers are a popular choice for classification tasks and as backbo...
research
03/05/2021

Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth

Attention-based architectures have become ubiquitous in machine learning...
research
12/10/2022

CamoFormer: Masked Separable Attention for Camouflaged Object Detection

How to identify and segment camouflaged objects from the background is c...
research
10/05/2021

Transformer Assisted Convolutional Network for Cell Instance Segmentation

Region proposal based methods like R-CNN and Faster R-CNN models have pr...
research
02/16/2022

ActionFormer: Localizing Moments of Actions with Transformers

Self-attention based Transformer models have demonstrated impressive res...
research
01/20/2021

Classifying Scientific Publications with BERT – Is Self-Attention a Feature Selection Method?

We investigate the self-attention mechanism of BERT in a fine-tuning sce...

Please sign up or login with your details

Forgot password? Click here to reset