Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

05/19/2022
by   Xiaosong Zhang, et al.
0

Modern object detectors have taken the advantages of pre-trained vision transformers by using them as backbone networks. However, except for the backbone networks, other detector components, such as the detector head and the feature pyramid network, remain randomly initialized, which hinders the consistency between detectors and pre-trained models. In this study, we propose to integrally migrate the pre-trained transformer encoder-decoders (imTED) for object detection, constructing a feature extraction-operation path that is not only "fully pre-trained" but also consistent with pre-trained models. The essential improvements of imTED over existing transformer-based detectors are twofold: (1) it embeds the pre-trained transformer decoder to the detector head; and (2) it removes the feature pyramid network from the feature extraction path. Such improvements significantly reduce the proportion of randomly initialized parameters and enhance the generation capability of detectors. Experiments on MS COCO dataset demonstrate that imTED consistently outperforms its counterparts by  2.8 improves the state-of-the-art of few-shot object detection by up to 7.6 demonstrating significantly higher generalization capability. Code will be made publicly available.

READ FULL TEXT

page 2

page 7

research
04/06/2022

Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection

We present an approach to efficiently and effectively adapt a masked ima...
research
03/30/2022

Exploring Plain Vision Transformer Backbones for Object Detection

We explore the plain, non-hierarchical Vision Transformer (ViT) as a bac...
research
05/27/2023

On the Importance of Backbone to the Adversarial Robustness of Object Detectors

Object detection is a critical component of various security-sensitive a...
research
08/27/2018

Exploring the Applications of Faster R-CNN and Single-Shot Multi-box Detection in a Smart Nursery Domain

The ultimate goal of a baby detection task concerns detecting the presen...
research
06/14/2022

Efficient Decoder-free Object Detection with Transformers

Vision transformers (ViTs) are changing the landscape of object detectio...
research
03/28/2022

Few-Shot Object Detection with Fully Cross-Transformer

Few-shot object detection (FSOD), with the aim to detect novel objects u...
research
09/09/2019

Understanding the Effects of Pre-Training for Object Detectors via Eigenspectrum

ImageNet pre-training has been regarded as essential for training accura...

Please sign up or login with your details

Forgot password? Click here to reset