Benchmarking Detection Transfer Learning with Vision Transformers

11/22/2021
by   Yanghao Li, et al.
17

Object detection is a central downstream task used to test if pre-trained network parameters confer benefits, such as improved accuracy or training speed. The complexity of object detection methods can make this benchmarking non-trivial when new architectures, such as Vision Transformer (ViT) models, arrive. These difficulties (e.g., architectural incompatibility, slow training, high memory consumption, unknown training formulae, etc.) have prevented recent studies from benchmarking detection transfer learning with standard ViT models. In this paper, we present training techniques that overcome these challenges, enabling the use of standard ViT models as the backbone of Mask R-CNN. These tools facilitate the primary goal of our study: we compare five ViT initializations, including recent state-of-the-art self-supervised learning methods, supervised initialization, and a strong random initialization baseline. Our results show that recent masking-based unsupervised learning methods may, for the first time, provide convincing transfer learning improvements on COCO, increasing box AP up to 4 prior self-supervised pre-training methods. Moreover, these masking-based initializations scale better, with the improvement growing as model size increases.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/15/2023

SeqCo-DETR: Sequence Consistency Training for Self-Supervised Object Detection with Transformers

Self-supervised pre-training and transformer-based networks have signifi...
research
06/21/2021

GAIA: A Transfer Learning System of Object Detection that Fits Your Needs

Transfer learning with pre-training on large-scale datasets has played a...
research
04/14/2022

DeiT III: Revenge of the ViT

A Vision Transformer (ViT) is a simple neural architecture amenable to s...
research
11/18/2021

Swin Transformer V2: Scaling Up Capacity and Resolution

We present techniques for scaling Swin Transformer up to 3 billion param...
research
02/07/2022

Simple Control Baselines for Evaluating Transfer Learning

Transfer learning has witnessed remarkable progress in recent years, for...
research
02/17/2023

Self-Supervised Representation Learning from Temporal Ordering of Automated Driving Sequences

Self-supervised feature learning enables perception systems to benefit f...
research
05/20/2022

Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors

Deep learning is increasingly moving towards a transfer learning paradig...

Please sign up or login with your details

Forgot password? Click here to reset