Exploring Plain Vision Transformer Backbones for Object Detection

03/30/2022
by   Yanghao Li, et al.
0

We explore the plain, non-hierarchical Vision Transformer (ViT) as a backbone network for object detection. This design enables the original ViT architecture to be fine-tuned for object detection without needing to redesign a hierarchical backbone for pre-training. With minimal adaptations for fine-tuning, our plain-backbone detector can achieve competitive results. Surprisingly, we observe: (i) it is sufficient to build a simple feature pyramid from a single-scale feature map (without the common FPN design) and (ii) it is sufficient to use window attention (without shifting) aided with very few cross-window propagation blocks. With plain ViT backbones pre-trained as Masked Autoencoders (MAE), our detector, named ViTDet, can compete with the previous leading methods that were all based on hierarchical backbones, reaching up to 61.3 box AP on the COCO dataset using only ImageNet-1K pre-training. We hope our study will draw attention to research on plain-backbone detectors. Code will be made available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/06/2022

Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection

We present an approach to efficiently and effectively adapt a masked ima...
research
05/19/2022

Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

Modern object detectors have taken the advantages of pre-trained vision ...
research
04/01/2022

Proper Reuse of Image Classification Features Improves Object Detection

A common practice in transfer learning is to initialize the downstream m...
research
03/26/2023

Mind the Backbone: Minimizing Backbone Distortion for Robust Object Detection

Building object detectors that are robust to domain shifts is critical f...
research
06/01/2021

You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection

Can Transformer perform 2D object-level recognition from a pure sequence...
research
09/11/2023

An Effective Two-stage Training Paradigm Detector for Small Dataset

Learning from the limited amount of labeled data to the pre-train model ...
research
11/23/2022

Self-Supervised Learning based on Heat Equation

This paper presents a new perspective of self-supervised learning based ...

Please sign up or login with your details

Forgot password? Click here to reset