Optimization Theory for ReLU Neural Networks Trained with Normalization Layers

06/11/2020
by   Yonatan Dukler, et al.
1

The success of deep neural networks is in part due to the use of normalization layers. Normalization layers like Batch Normalization, Layer Normalization and Weight Normalization are ubiquitous in practice, as they improve generalization performance and speed up training significantly. Nonetheless, the vast majority of current deep learning theory and non-convex optimization literature focuses on the un-normalized setting, where the functions under consideration do not exhibit the properties of commonly normalized neural networks. In this paper, we bridge this gap by giving the first global convergence result for two-layer neural networks with ReLU activations trained with a normalization layer, namely Weight Normalization. Our analysis shows how the introduction of normalization layers changes the optimization landscape and can enable faster convergence as compared with un-normalized neural networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/03/2019

Static Activation Function Normalization

Recent seminal work at the intersection of deep neural networks practice...
research
06/10/2021

Beyond BatchNorm: Towards a General Understanding of Normalization in Deep Learning

Inspired by BatchNorm, there has been an explosion of normalization laye...
research
10/16/2020

Filtered Batch Normalization

It is a common assumption that the activation of different layers in neu...
research
05/15/2019

Rethinking the Usage of Batch Normalization and Dropout in the Training of Deep Neural Networks

In this work, we propose a novel technique to boost training efficiency ...
research
06/07/2019

The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks

Normalization methods play an important role in enhancing the performanc...
research
11/14/2016

Identity Matters in Deep Learning

An emerging design principle in deep learning is that each layer of a de...
research
02/25/2020

Separating the Effects of Batch Normalization on CNN Training Speed and Stability Using Classical Adaptive Filter Theory

Batch Normalization (BatchNorm) is commonly used in Convolutional Neural...

Please sign up or login with your details

Forgot password? Click here to reset