Co-training 2^L Submodels for Visual Recognition

12/09/2022
by   Hugo Touvron, et al.
4

We introduce submodel co-training, a regularization method related to co-training, self-distillation and stochastic depth. Given a neural network to be trained, for each sample we implicitly instantiate two altered networks, “submodels”, with stochastic depth: we activate only a subset of the layers. Each network serves as a soft teacher to the other, by providing a loss that complements the regular loss provided by the one-hot label. Our approach, dubbed cosub, uses a single set of weights, and does not involve a pre-trained external model or temporal averaging. Experimentally, we show that submodel co-training is effective to train backbones for recognition tasks such as image classification and semantic segmentation. Our approach is compatible with multiple architectures, including RegNet, ViT, PiT, XCiT, Swin and ConvNext. Our training strategy improves their results in comparable settings. For instance, a ViT-B pretrained with cosub on ImageNet-21k obtains 87.4

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2021

X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task Distillation

In this paper, we propose a novel method, X-Distill, to improve the self...
research
03/10/2021

Beyond Self-Supervision: A Simple Yet Effective Network Distillation Alternative to Improve Backbones

Recently, research efforts have been concentrated on revealing how pre-t...
research
12/23/2020

Training data-efficient image transformers distillation through attention

Recently, neural networks purely based on attention were shown to addres...
research
11/15/2021

iBOT: Image BERT Pre-Training with Online Tokenizer

The success of language Transformers is primarily attributed to the pret...
research
09/23/2021

LGD: Label-guided Self-distillation for Object Detection

In this paper, we propose the first self-distillation framework for gene...
research
09/20/2020

Learning Soft Labels via Meta Learning

One-hot labels do not represent soft decision boundaries among concepts,...
research
08/23/2020

Seesaw Loss for Long-Tailed Instance Segmentation

This report presents the approach used in the submission of the LVIS Cha...

Please sign up or login with your details

Forgot password? Click here to reset