Orthogonal Over-Parameterized Training

04/09/2020
by   Weiyang Liu, et al.
7

The inductive bias of a neural network is largely determined by the architecture and the training algorithm. To achieve good generalization, how to effectively train a neural network is even more important than designing the architecture. We propose a novel orthogonal over-parameterized training (OPT) framework that can provably minimize the hyperspherical energy which characterizes the diversity of neurons on a hypersphere. By constantly maintaining the minimum hyperspherical energy during training, OPT can greatly improve the network generalization. Specifically, OPT fixes the randomly initialized weights of the neurons and learns an orthogonal transformation that applies to these neurons. We propose multiple ways to learn such an orthogonal transformation, including unrolling orthogonalization algorithms, applying orthogonal parameterization, and designing orthogonality-preserving gradient update. Interestingly, OPT reveals that learning a proper coordinate system for neurons is crucial to generalization and may be more important than learning a specific relative position of neurons. We further provide theoretical insights of why OPT yields better generalization. Extensive experiments validate the superiority of OPT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/02/2021

Learning with Hyperspherical Uniformity

Due to the over-parameterization nature, neural networks are a powerful ...
research
02/04/2021

A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural Network

While over-parameterization is widely believed to be crucial for the suc...
research
09/26/2019

Sequential Training of Neural Networks with Gradient Boosting

This paper presents a novel technique based on gradient boosting to trai...
research
06/12/2019

Compressive Hyperspherical Energy Minimization

Recent work on minimum hyperspherical energy (MHE) has demonstrated its ...
research
07/23/2019

Trainability and Data-dependent Initialization of Over-parameterized ReLU Neural Networks

A neural network is said to be over-specified if its representational po...
research
07/18/2023

Can Neural Network Memorization Be Localized?

Recent efforts at explaining the interplay of memorization and generaliz...
research
06/25/2021

Data efficiency in graph networks through equivariance

We introduce a novel architecture for graph networks which is equivarian...

Please sign up or login with your details

Forgot password? Click here to reset