The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models

by   Tianlong Chen, et al.

The computer vision world has been re-gaining enthusiasm in various pre-trained models, including both classical ImageNet supervised pre-training and recently emerged self-supervised pre-training such as simCLR and MoCo. Pre-trained weights often boost a wide range of downstream tasks including classification, detection, and segmentation. Latest studies suggest that the pre-training benefits from gigantic model capacity. We are hereby curious and ask: after pre-training, does a pre-trained model indeed have to stay large for its universal downstream transferability? In this paper, we examine the supervised and self-supervised pre-trained models through the lens of lottery ticket hypothesis (LTH). LTH identifies highly sparse matching subnetworks that can be trained in isolation from (nearly) scratch, to reach the full models' performance. We extend the scope of LTH to questioning whether matching subnetworks still exist in the pre-training models, that enjoy the same downstream transfer performance. Our extensive experiments convey an overall positive message: from all pre-trained weights obtained by ImageNet classification, simCLR and MoCo, we are consistently able to locate such matching subnetworks at 59.04 universally to multiple downstream tasks, whose performance see no degradation compared to using full pre-trained weights. Further analyses reveal that subnetworks found from different pre-training tend to yield diverse mask structures and perturbation sensitivities. We conclude that the core LTH observations remain generally relevant in the pre-training paradigm of computer vision, but more delicate discussions are needed in some cases. Codes and pre-trained models will be made available at:



page 5

page 7


A Closer Look at Self-supervised Lightweight Vision Transformers

Self-supervised learning on large-scale Vision Transformers (ViTs) as pr...

The Lottery Ticket Hypothesis for Pre-trained BERT Networks

In natural language processing (NLP), enormous pre-trained models like B...

Task2Sim : Towards Effective Pre-training and Transfer from Synthetic Data

Pre-training models on Imagenet or other massive datasets of real images...

UER: An Open-Source Toolkit for Pre-training Models

Existing works, including ELMO and BERT, have revealed the importance of...

On Data Scaling in Masked Image Modeling

An important goal of self-supervised learning is to enable model pre-tra...

Revealing the Dark Secrets of Masked Image Modeling

Masked image modeling (MIM) as pre-training is shown to be effective for...

Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning

Pretrained models from self-supervision are prevalently used in fine-tun...

Code Repositories


[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.