Diverse Weight Averaging for Out-of-Distribution Generalization

05/19/2022
by   Alexandre Ramé, et al.
9

Standard neural networks struggle to generalize under distribution shifts. For out-of-distribution generalization in computer vision, the best current approach averages the weights along a training run. In this paper, we propose Diverse Weight Averaging (DiWA) that makes a simple change to this strategy: DiWA averages the weights obtained from several independent training runs rather than from a single run. Perhaps surprisingly, averaging these weights performs well under soft constraints despite the network's nonlinearities. The main motivation behind DiWA is to increase the functional diversity across averaged models. Indeed, models obtained from different runs are more diverse than those collected along a single run thanks to differences in hyperparameters and training procedures. We motivate the need for diversity by a new bias-variance-covariance-locality decomposition of the expected error, exploiting similarities between DiWA and standard functional ensembling. Moreover, this decomposition highlights that DiWA succeeds when the variance term dominates, which we show happens when the marginal distribution changes at test time. Experimentally, DiWA consistently improves the state of the art on the competitive DomainBed benchmark without inference overhead.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/06/2023

PopulAtion Parameter Averaging (PAPA)

Ensemble methods combine the predictions of multiple models to improve p...
research
01/07/2020

Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well

We propose Stochastic Weight Averaging in Parallel (SWAP), an algorithm ...
research
02/28/2023

DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks

Generalization of neural networks is crucial for deploying them safely i...
research
09/15/2022

Test-Time Training with Masked Autoencoders

Test-time training adapts to a new test distribution on the fly by optim...
research
01/03/2022

Stochastic Weight Averaging Revisited

Stochastic weight averaging (SWA) is recognized as a simple while one ef...
research
09/29/2022

Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging

Training vision or language models on large datasets can take days, if n...
research
02/20/2021

Learning Neural Network Subspaces

Recent observations have advanced our understanding of the neural networ...

Please sign up or login with your details

Forgot password? Click here to reset