Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well

01/07/2020
by   Vipul Gupta, et al.
0

We propose Stochastic Weight Averaging in Parallel (SWAP), an algorithm to accelerate DNN training. Our algorithm uses large mini-batches to compute an approximate solution quickly and then refines it by averaging the weights of multiple models computed independently and in parallel. The resulting models generalize equally well as those trained with small mini-batches but are produced in a substantially shorter time. We demonstrate the reduction in training time and the good generalization performance of the resulting models on the computer vision datasets CIFAR10, CIFAR100, and ImageNet.

READ FULL TEXT
research
04/23/2023

Hierarchical Weight Averaging for Deep Neural Networks

Despite the simplicity, stochastic gradient descent (SGD)-like algorithm...
research
09/29/2022

Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging

Training vision or language models on large datasets can take days, if n...
research
05/19/2022

Diverse Weight Averaging for Out-of-Distribution Generalization

Standard neural networks struggle to generalize under distribution shift...
research
01/05/2023

Training trajectories, mini-batch losses and the curious role of the learning rate

Stochastic gradient descent plays a fundamental role in nearly all appli...
research
04/06/2023

PopulAtion Parameter Averaging (PAPA)

Ensemble methods combine the predictions of multiple models to improve p...
research
10/24/2022

Budget-Constrained Bounds for Mini-Batch Estimation of Optimal Transport

Optimal Transport (OT) is a fundamental tool for comparing probability d...
research
03/12/2018

Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches

Stochastic neural net weights are used in a variety of contexts, includi...

Please sign up or login with your details

Forgot password? Click here to reset