Domain Generalization Needs Stochastic Weight Averaging for Robustness on Domain Shifts

by   Junbum Cha, et al.

Domain generalization aims to learn a generalizable model to unseen target domains from multiple source domains. Various approaches have been proposed to address this problem. However, recent benchmarks show that most of them do not provide significant improvements compared to the simple empirical risk minimization (ERM) in practical cases. In this paper, we analyze how ERM works in views of domain-invariant feature learning and domain-specific gradient normalization. In addition, we observe that ERM converges to a loss valley shared over multiple training domains and obtain an insight that a center of the valley generalizes better. To estimate the center, we employ stochastic weight averaging (SWA) and provide theoretical analysis describing how SWA supports the generalization bound for an unseen domain. As a result, we achieve state-of-the-art performances over all of widely used domain generalization benchmarks, namely PACS, VLCS, OfficeHome, TerraIncognita, and DomainNet with large margins. Further analysis reveals how SWA operates on domain generalization tasks.


page 4

page 15

page 16

page 17

page 18

page 19

page 20

page 21


Adaptive Domain Generalization via Online Disagreement Minimization

Deep neural networks suffer from significant performance deterioration w...

Exploiting Domain-Specific Features to Enhance Domain Generalization

Domain Generalization (DG) aims to train a model, from multiple observed...

Learning to Balance Specificity and Invariance for In and Out of Domain Generalization

We introduce Domain-specific Masks for Generalization, a model for impro...

Implicit Semantic Augmentation for Distance Metric Learning in Domain Generalization

Domain generalization (DG) aims to learn a model on one or more differen...

Domain Generalization via Multidomain Discriminant Analysis

Domain generalization (DG) aims to incorporate knowledge from multiple s...

Learning Domain Invariant Representations by Joint Wasserstein Distance Minimization

Domain shifts in the training data are common in practical applications ...

Finding lost DG: Explaining domain generalization via model complexity

The domain generalization (DG) problem setting challenges a model traine...