Non-Proportional Parametrizations for Stable Hypernetwork Learning

Hypernetworks are neural networks that generate the parameters of another neural network. In many scenarios, current hypernetwork training strategies are unstable, and convergence is often far slower than for non-hypernetwork models. We show that this problem is linked to an issue that arises when using common choices of hypernetwork architecture and initialization. We demonstrate analytically and experimentally how this numerical issue can lead to an instability during training that slows, and sometimes even prevents, convergence. We also demonstrate that popular deep learning normalization strategies fail to address these issues. We then propose a solution to the problem based on a revised hypernetwork formulation that uses non-proportional additive parametrizations. We test the proposed reparametrization on several tasks, and demonstrate that it consistently leads to more stable training, achieving faster convergence.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/27/2019

Fixup Initialization: Residual Learning Without Normalization

Normalization layers are a staple in state-of-the-art deep neural networ...
research
11/29/2019

Mean Shift Rejection: Training Deep Neural Networks Without Minibatch Statistics or Normalization

Deep convolutional neural networks are known to be unstable during train...
research
05/15/2019

Rethinking the Usage of Batch Normalization and Dropout in the Training of Deep Neural Networks

In this work, we propose a novel technique to boost training efficiency ...
research
06/29/2023

Spectral Batch Normalization: Normalization in the Frequency Domain

Regularization is a set of techniques that are used to improve the gener...
research
06/06/2020

Non Proportional Odds Models are Widely Dispensable – Sparser Modeling based on Parametric and Additive Location-Shift Approaches

The potential of location-shift models to find adequate models between t...
research
10/08/2021

A Loss Curvature Perspective on Training Instability in Deep Learning

In this work, we study the evolution of the loss Hessian across many cla...

Please sign up or login with your details

Forgot password? Click here to reset