TransMix: Attend to Mix for Vision Transformers

11/18/2021
by   Jie-Neng Chen, et al.
0

Mixup-based augmentation has been found to be effective for generalizing models during training, especially for Vision Transformers (ViTs) since they can easily overfit. However, previous mixup-based methods have an underlying prior knowledge that the linearly interpolated ratio of targets should be kept the same as the ratio proposed in input interpolation. This may lead to a strange phenomenon that sometimes there is no valid object in the mixed image due to the random process in augmentation but there is still response in the label space. To bridge such gap between the input and label spaces, we propose TransMix, which mixes labels based on the attention maps of Vision Transformers. The confidence of the label will be larger if the corresponding input image is weighted higher by the attention map. TransMix is embarrassingly simple and can be implemented in just a few lines of code without introducing any extra parameters and FLOPs to ViT-based models. Experimental results show that our method can consistently improve various ViT-based models at scales on ImageNet classification. After pre-trained with TransMix on ImageNet, the ViT-based models also demonstrate better transferability to semantic segmentation, object detection and instance segmentation. TransMix also exhibits to be more robust when evaluating on 4 different benchmarks. Code will be made publicly available at https://github.com/Beckschen/TransMix.

READ FULL TEXT

page 1

page 8

research
11/29/2022

LUMix: Improving Mixup by Better Modelling Label Uncertainty

Modern deep networks can be better generalized when trained with noisy s...
research
12/23/2022

A Close Look at Spatial Modeling: From Attention to Convolution

Vision Transformers have shown great promise recently for many vision ta...
research
07/28/2022

HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

Recent progress in vision Transformers exhibits great success in various...
research
06/27/2023

CellViT: Vision Transformers for Precise Cell Segmentation and Classification

Nuclei detection and segmentation in hematoxylin and eosin-stained (H ...
research
10/12/2022

Token-Label Alignment for Vision Transformers

Data mixing strategies (e.g., CutMix) have shown the ability to greatly ...
research
12/26/2022

SMMix: Self-Motivated Image Mixing for Vision Transformers

CutMix is a vital augmentation strategy that determines the performance ...
research
03/08/2023

M-EBM: Towards Understanding the Manifolds of Energy-Based Models

Energy-based models (EBMs) exhibit a variety of desirable properties in ...

Please sign up or login with your details

Forgot password? Click here to reset