Learning Diverse Features in Vision Transformers for Improved Generalization

08/30/2023
by   Armand Mihai Nicolicioiu, et al.
0

Deep learning models often rely only on a small set of features even when there is a rich set of predictive signals in the training data. This makes models brittle and sensitive to distribution shifts. In this work, we first examine vision transformers (ViTs) and find that they tend to extract robust and spurious features with distinct attention heads. As a result of this modularity, their performance under distribution shifts can be significantly improved at test time by pruning heads corresponding to spurious features, which we demonstrate using an "oracle selection" on validation data. Second, we propose a method to further enhance the diversity and complementarity of the learned features by encouraging orthogonality of the attention heads' input gradients. We observe improved out-of-distribution performance on diagnostic benchmarks (MNIST-CIFAR, Waterbirds) as a consequence of the enhanced diversity of features and the pruning of undesirable heads.

READ FULL TEXT
research
07/16/2023

Domain Generalisation with Bidirectional Encoder Representations from Vision Transformers

Domain generalisation involves pooling knowledge from source domain(s) i...
research
02/09/2022

Agree to Disagree: Diversity through Disagreement for Better Transferability

Gradient-based learning algorithms have an implicit simplicity bias whic...
research
05/26/2023

A Closer Look at In-Context Learning under Distribution Shifts

In-context learning, a capability that enables a model to learn from inp...
research
05/09/2023

Even Small Correlation and Diversity Shifts Pose Dataset-Bias Issues

Distribution shifts are common in real-world datasets and can affect the...
research
04/13/2020

Pretrained Transformers Improve Out-of-Distribution Robustness

Although pretrained Transformers such as BERT achieve high accuracy on i...
research
06/10/2022

Learning to Estimate Shapley Values with Vision Transformers

Transformers have become a default architecture in computer vision, but ...
research
05/17/2021

Vision Transformers are Robust Learners

Transformers, composed of multiple self-attention layers, hold strong pr...

Please sign up or login with your details

Forgot password? Click here to reset