Learning Neural Network Subspaces

02/20/2021
by   Mitchell Wortsman, et al.
15

Recent observations have advanced our understanding of the neural network optimization landscape, revealing the existence of (1) paths of high accuracy containing diverse solutions and (2) wider minima offering improved performance. Previous methods observing diverse paths require multiple training runs. In contrast we aim to leverage both property (1) and (2) with a single method and in a single training run. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. These neural network subspaces contain diverse solutions that can be ensembled, approaching the ensemble performance of independently trained networks without the training cost. Moreover, using the subspace midpoint boosts accuracy, calibration, and robustness to label noise, outperforming Stochastic Weight Averaging.

READ FULL TEXT

page 2

page 15

page 16

research
11/21/2019

Regularizing Neural Networks by Stochastically Training Layer Ensembles

Dropout and similar stochastic neural network regularization methods are...
research
10/13/2020

Training independent subnetworks for robust prediction

Recent approaches to efficiently ensemble neural networks have shown tha...
research
03/02/2018

Essentially No Barriers in Neural Network Energy Landscape

Training neural networks involves finding minima of a high-dimensional n...
research
01/08/2019

Learning with Collaborative Neural Network Group by Reflection

For the present engineering of neural systems, the preparing of extensiv...
research
05/18/2021

Solving the electronic Schrödinger equation for multiple nuclear geometries with weight-sharing deep neural networks

Accurate numerical solutions for the Schrödinger equation are of utmost ...
research
05/07/2019

A Generative Model for Sampling High-Performance and Diverse Weights for Neural Networks

Recent work on mode connectivity in the loss landscape of deep neural ne...
research
05/19/2022

Diverse Weight Averaging for Out-of-Distribution Generalization

Standard neural networks struggle to generalize under distribution shift...

Please sign up or login with your details

Forgot password? Click here to reset