Counterbalancing Teacher: Regularizing Batch Normalized Models for Robustness

07/04/2022
by   Saeid Asgari Taghanaki, et al.
4

Batch normalization (BN) is a ubiquitous technique for training deep neural networks that accelerates their convergence to reach higher accuracy. However, we demonstrate that BN comes with a fundamental drawback: it incentivizes the model to rely on low-variance features that are highly specific to the training (in-domain) data, hurting generalization performance on out-of-domain examples. In this work, we investigate this phenomenon by first showing that removing BN layers across a wide range of architectures leads to lower out-of-domain and corruption errors at the cost of higher in-domain errors. We then propose Counterbalancing Teacher (CT), a method which leverages a frozen copy of the same model without BN as a teacher to enforce the student network's learning of robust representations by substantially adapting its weights through a consistency loss function. This regularization signal helps CT perform well in unforeseen data shifts, even without information from the target domain as in prior works. We theoretically show in an overparameterized linear regression setting why normalization leads to a model's reliance on such in-domain features, and empirically demonstrate the efficacy of CT by outperforming several baselines on robustness benchmarks such as CIFAR-10-C, CIFAR-100-C, and VLCS.

READ FULL TEXT
research
07/01/2022

Adapting the Mean Teacher for keypoint-based lung registration under geometric domain shifts

Recent deep learning-based methods for medical image registration achiev...
research
05/24/2022

Alleviating Robust Overfitting of Adversarial Training With Consistency Regularization

Adversarial training (AT) has proven to be one of the most effective way...
research
07/09/2019

Learning to Optimize Domain Specific Normalization for Domain Generalization

We propose a simple but effective multi-source domain generalization tec...
research
07/19/2022

ML-BPM: Multi-teacher Learning with Bidirectional Photometric Mixing for Open Compound Domain Adaptation in Semantic Segmentation

Open compound domain adaptation (OCDA) considers the target domain as th...
research
02/13/2018

Uncertainty Estimation via Stochastic Batch Normalization

In this work, we investigate Batch Normalization technique and propose i...
research
09/01/2016

Ternary Neural Networks for Resource-Efficient AI Applications

The computation and storage requirements for Deep Neural Networks (DNNs)...
research
11/07/2016

Regularizing CNNs with Locally Constrained Decorrelations

Regularization is key for deep learning since it allows training more co...

Please sign up or login with your details

Forgot password? Click here to reset