Extreme Masking for Learning Instance and Distributed Visual Representations

06/09/2022
by   Zhirong Wu, et al.
0

The paper presents a scalable approach for learning distributed representations over individual tokens and a holistic instance representation simultaneously. We use self-attention blocks to represent distributed tokens, followed by cross-attention blocks to aggregate the holistic instance. The core of the approach is the use of extremely large token masking (75 data augmentation for supervision. Our model, named ExtreMA, follows the plain BYOL approach where the instance representation from the unmasked subset is trained to predict that from the intact input. Learning requires the model to capture informative variations in an instance, instead of encouraging invariances. The paper makes three contributions: 1) Random masking is a strong and computationally efficient data augmentation for learning generalizable attention representations. 2) With multiple sampling per instance, extreme masking greatly speeds up learning and hungers for more data. 3) Distributed representations can be learned from the instance supervision alone, unlike per-token supervisions in masked modeling.

READ FULL TEXT

page 3

page 5

page 6

research
09/15/2022

Hydra Attention: Efficient Attention with Many Heads

While transformers have begun to dominate many tasks in vision, applying...
research
10/14/2022

TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers

Mixup is a commonly adopted data augmentation technique for image classi...
research
06/08/2023

Muti-Scale And Token Mergence: Make Your ViT More Efficient

Since its inception, Vision Transformer (ViT) has emerged as a prevalent...
research
03/24/2023

Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers

Vision Transformers (ViT) have shown their competitive advantages perfor...
research
10/21/2022

Augmentation with Projection: Towards an Effective and Efficient Data Augmentation Paradigm for Distillation

Knowledge distillation is one of the primary methods of transferring kno...
research
01/16/2023

Masked Vector Quantization

Generative models with discrete latent representations have recently dem...
research
01/05/2023

All in Tokens: Unifying Output Space of Visual Tasks via Soft Token

Unlike language tasks, where the output space is usually limited to a se...

Please sign up or login with your details

Forgot password? Click here to reset