The Lottery Ticket Hypothesis for Vision Transformers

11/02/2022
by   Xuan Shen, et al.
0

The conventional lottery ticket hypothesis (LTH) claims that there exists a sparse subnetwork within a dense neural network and a proper random initialization method, called the winning ticket, such that it can be trained from scratch to almost as good as the dense counterpart. Meanwhile, the research of LTH in vision transformers (ViTs) is scarcely evaluated. In this paper, we first show that the conventional winning ticket is hard to find at weight level of ViTs by existing methods. Then, we generalize the LTH for ViTs to input images consisting of image patches inspired by the input dependence of ViTs. That is, there exists a subset of input image patches such that a ViT can be trained from scratch by using only this subset of patches and achieve similar accuracy to the ViTs trained by using all image patches. We call this subset of input patches the winning tickets, which represent a significant amount of information in the input. Furthermore, we present a simple yet effective method to find the winning tickets in input patches for various types of ViT, including DeiT, LV-ViT, and Swin Transformers. More specifically, we use a ticket selector to generate the winning tickets based on the informativeness of patches. Meanwhile, we build another randomly selected subset of patches for comparison, and the experiments show that there is clear difference between the performance of models trained with winning tickets and randomly selected subsets.

READ FULL TEXT
research
11/20/2021

Are Vision Transformers Robust to Patch Perturbations?

The recent advances in Vision Transformer (ViT) have demonstrated its im...
research
10/03/2016

Rain structure transfer using an exemplar rain image for synthetic rain image generation

This letter proposes a simple method of transferring rain structures of ...
research
06/15/2023

Fast Training of Diffusion Models with Masked Transformers

We propose an efficient approach to train large diffusion models with ma...
research
09/08/2023

Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts

Sparse Mixture-of-Experts models (MoEs) have recently gained popularity ...
research
05/08/2023

Understanding Gaussian Attention Bias of Vision Transformers Using Effective Receptive Fields

Vision transformers (ViTs) that model an image as a sequence of partitio...
research
02/10/2022

Exploiting Spatial Sparsity for Event Cameras with Visual Transformers

Event cameras report local changes of brightness through an asynchronous...
research
12/01/2018

Internal Distribution Matching for Natural Image Retargeting

Good Visual Retargeting changes the global size and aspect ratio of a na...

Please sign up or login with your details

Forgot password? Click here to reset