Learning Less Generalizable Patterns with an Asymmetrically Trained Double Classifier for Better Test-Time Adaptation

by   Thomas Duboudin, et al.

Deep neural networks often fail to generalize outside of their training distribution, in particular when only a single data domain is available during training. While test-time adaptation has yielded encouraging results in this setting, we argue that, to reach further improvements, these approaches should be combined with training procedure modifications aiming to learn a more diverse set of patterns. Indeed, test-time adaptation methods usually have to rely on a limited representation because of the shortcut learning phenomenon: only a subset of the available predictive patterns is learned with standard training. In this paper, we first show that the combined use of existing training-time strategies, and test-time batch normalization, a simple adaptation method, does not always improve upon the test-time adaptation alone on the PACS benchmark. Furthermore, experiments on Office-Home show that very few training-time methods improve upon standard training, with or without test-time batch normalization. We therefore propose a novel approach using a pair of classifiers and a shortcut patterns avoidance loss that mitigates the shortcut learning behavior by reducing the generalization ability of the secondary classifier, using the additional shortcut patterns avoidance loss that encourages the learning of samples specific patterns. The primary classifier is trained normally, resulting in the learning of both the natural and the more complex, less generalizable, features. Our experiments show that our method improves upon the state-of-the-art results on both benchmarks and benefits the most to test-time batch normalization.


page 1

page 2

page 3

page 4


Test-time Batch Normalization

Deep neural networks often suffer the data distribution shift between tr...

TTN: A Domain-Shift Aware Batch Normalization in Test-Time Adaptation

This paper proposes a novel batch normalization strategy for test-time a...

Transfer and Marginalize: Explaining Away Label Noise with Privileged Information

Supervised learning datasets often have privileged information, in the f...

Train/Test-Time Adaptation with Retrieval

We introduce Train/Test-Time Adaptation with Retrieval (T^3AR), a method...

Simple High Quality OoD Detection with L2 Normalization

We propose a simple modification to standard ResNet architectures during...

Test-Time Adaptation with Shape Moments for Image Segmentation

Supervised learning is well-known to fail at generalization under distri...

Distribution Normalization: An "Effortless" Test-Time Augmentation for Contrastively Learned Visual-language Models

Advances in the field of visual-language contrastive learning have made ...

Please sign up or login with your details

Forgot password? Click here to reset