The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models

12/12/2020
by   Tianlong Chen, et al.
9

The computer vision world has been re-gaining enthusiasm in various pre-trained models, including both classical ImageNet supervised pre-training and recently emerged self-supervised pre-training such as simCLR and MoCo. Pre-trained weights often boost a wide range of downstream tasks including classification, detection, and segmentation. Latest studies suggest that the pre-training benefits from gigantic model capacity. We are hereby curious and ask: after pre-training, does a pre-trained model indeed have to stay large for its universal downstream transferability? In this paper, we examine the supervised and self-supervised pre-trained models through the lens of lottery ticket hypothesis (LTH). LTH identifies highly sparse matching subnetworks that can be trained in isolation from (nearly) scratch, to reach the full models' performance. We extend the scope of LTH to questioning whether matching subnetworks still exist in the pre-training models, that enjoy the same downstream transfer performance. Our extensive experiments convey an overall positive message: from all pre-trained weights obtained by ImageNet classification, simCLR and MoCo, we are consistently able to locate such matching subnetworks at 59.04 universally to multiple downstream tasks, whose performance see no degradation compared to using full pre-trained weights. Further analyses reveal that subnetworks found from different pre-training tend to yield diverse mask structures and perturbation sensitivities. We conclude that the core LTH observations remain generally relevant in the pre-training paradigm of computer vision, but more delicate discussions are needed in some cases. Codes and pre-trained models will be made available at: https://github.com/VITA-Group/CV_LTH_Pre-training.

READ FULL TEXT

page 5

page 7

research
05/28/2022

A Closer Look at Self-supervised Lightweight Vision Transformers

Self-supervised learning on large-scale Vision Transformers (ViTs) as pr...
research
07/23/2020

The Lottery Ticket Hypothesis for Pre-trained BERT Networks

In natural language processing (NLP), enormous pre-trained models like B...
research
08/03/2022

GROWN+UP: A Graph Representation Of a Webpage Network Utilizing Pre-training

Large pre-trained neural networks are ubiquitous and critical to the suc...
research
06/06/2023

The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter

Large pre-trained transformers are show-stealer in modern-day deep learn...
research
02/28/2023

Efficient Masked Autoencoders with Self-Consistency

Inspired by masked language modeling (MLM) in natural language processin...
research
12/12/2022

You Only Need a Good Embeddings Extractor to Fix Spurious Correlations

Spurious correlations in training data often lead to robustness issues s...
research
10/23/2022

Language Model Pre-Training with Sparse Latent Typing

Modern large-scale Pre-trained Language Models (PLMs) have achieved trem...

Please sign up or login with your details

Forgot password? Click here to reset