Self-Distilled Vision Transformer for Domain Generalization

07/25/2022
by   Maryam Sultana, et al.
3

In recent past, several domain generalization (DG) methods have been proposed, showing encouraging performance, however, almost all of them build on convolutional neural networks (CNNs). There is little to no progress on studying the DG performance of vision transformers (ViTs), which are challenging the supremacy of CNNs on standard benchmarks, often built on i.i.d assumption. This renders the real-world deployment of ViTs doubtful. In this paper, we attempt to explore ViTs towards addressing the DG problem. Similar to CNNs, ViTs also struggle in out-of-distribution scenarios and the main culprit is overfitting to source domains. Inspired by the modular architecture of ViTs, we propose a simple DG approach for ViTs, coined as self-distillation for ViTs. It reduces the overfitting to source domains by easing the learning of input-output mapping problem through curating non-zero entropy supervisory signals for intermediate transformer blocks. Further, it does not introduce any new parameters and can be seamlessly plugged into the modular composition of different ViTs. We empirically demonstrate notable performance gains with different DG baselines and various ViT backbones in five challenging datasets. Moreover, we report favorable performance against recent state-of-the-art DG methods. Our code along with pre-trained models are publicly available at: https://github.com/maryam089/SDViT

READ FULL TEXT

page 12

page 20

page 22

research
02/14/2023

Robust Representation Learning with Self-Distillation for Domain Generalization

Domain generalization is a challenging problem in machine learning, wher...
research
03/29/2021

CvT: Introducing Convolutions to Vision Transformers

We present in this paper a new architecture, named Convolutional vision ...
research
02/03/2023

Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective

Knowledge distillation (KD) is a highly promising method for mitigating ...
research
10/22/2018

Can We Gain More from Orthogonality Regularizations in Training Deep CNNs?

This paper seeks to answer the question: as the (near-) orthogonality of...
research
08/10/2023

Surface Masked AutoEncoder: Self-Supervision for Cortical Imaging Data

Self-supervision has been widely explored as a means of addressing the l...
research
03/14/2020

Investigating Generalization in Neural Networks under Optimally Evolved Training Perturbations

In this paper, we study the generalization properties of neural networks...
research
09/04/2022

Generalization in Neural Networks: A Broad Survey

This paper reviews concepts, modeling approaches, and recent findings al...

Please sign up or login with your details

Forgot password? Click here to reset