Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity

11/29/2021
by   Byungseok Roh, et al.
24

DETR is the first end-to-end object detector using a transformer encoder-decoder architecture and demonstrates competitive performance but low computational efficiency on high resolution feature maps. The subsequent work, Deformable DETR, enhances the efficiency of DETR by replacing dense attention with deformable attention, which achieves 10x faster convergence and improved performance. Deformable DETR uses the multiscale feature to ameliorate performance, however, the number of encoder tokens increases by 20x compared to DETR, and the computation cost of the encoder attention remains a bottleneck. In our preliminary experiment, we observe that the detection performance hardly deteriorates even if only a part of the encoder token is updated. Inspired by this observation, we propose Sparse DETR that selectively updates only the tokens expected to be referenced by the decoder, thus help the model effectively detect objects. In addition, we show that applying an auxiliary detection loss on the selected tokens in the encoder improves the performance while minimizing computational overhead. We validate that Sparse DETR achieves better performance than Deformable DETR even with only 10 the COCO dataset. Albeit only the encoder tokens are sparsified, the total computation cost decreases by 38 42 Code is available at https://github.com/kakaobrain/sparse-detr

READ FULL TEXT

page 19

page 20

page 21

page 22

page 23

research
07/24/2023

Less is More: Focus Attention for Efficient DETR

DETR-like models have significantly boosted the performance of detectors...
research
10/08/2020

Deformable DETR: Deformable Transformers for End-to-End Object Detection

DETR has been recently proposed to eliminate the need for many hand-desi...
research
04/07/2023

SparseFormer: Sparse Visual Recognition via Limited Latent Tokens

Human visual recognition is a sparse process, where only a few salient v...
research
11/22/2022

DETRs with Collaborative Hybrid Assignments Training

In this paper, we provide the observation that too few queries assigned ...
research
04/03/2021

Efficient DETR: Improving End-to-End Object Detector with Dense Prior

The recently proposed end-to-end transformer detectors, such as DETR and...
research
11/18/2020

End-to-End Object Detection with Adaptive Clustering Transformer

End-to-end Object Detection with Transformer (DETR)proposes to perform o...
research
08/16/2023

Agglomerative Transformer for Human-Object Interaction Detection

We propose an agglomerative Transformer (AGER) that enables Transformer-...

Please sign up or login with your details

Forgot password? Click here to reset