Demystifying Batch Normalization in ReLU Networks: Equivalent Convex Optimization Models and Implicit Regularization

03/02/2021
by   Tolga Ergen, et al.
16

Batch Normalization (BN) is a commonly used technique to accelerate and stabilize training of deep neural networks. Despite its empirical success, a full theoretical understanding of BN is yet to be developed. In this work, we analyze BN through the lens of convex optimization. We introduce an analytic framework based on convex duality to obtain exact convex representations of weight-decay regularized ReLU networks with BN, which can be trained in polynomial-time. Our analyses also show that optimal layer weights can be obtained as simple closed-form formulas in the high-dimensional and/or overparameterized regimes. Furthermore, we find that Gradient Descent provides an algorithmic bias effect on the standard non-convex BN network, and we design an approach to explicitly encode this implicit regularization into the convex objective. Experiments with CIFAR image classification highlight the effectiveness of this explicit regularization for mimicking and substantially improving the performance of standard BN networks.

READ FULL TEXT
research
10/11/2021

Global Optimality Beyond Two Layers: Training Deep ReLU Networks via Convex Programs

Understanding the fundamental mechanism behind the success of deep neura...
research
02/02/2022

Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions

We develop fast algorithms and robust software for convex optimization o...
research
06/26/2020

Implicit Convex Regularizers of CNN Architectures: Convex Optimization of Two- and Three-Layer Networks in Polynomial Time

We study training of Convolutional Neural Networks (CNNs) with ReLU acti...
research
10/18/2021

Path Regularization: A Convexity and Sparsity Inducing Regularization for Parallel ReLU Networks

Despite several attempts, the fundamental mechanisms behind the success ...
research
11/18/2019

Implicit Regularization of Normalization Methods

Normalization methods such as batch normalization are commonly used in o...
research
10/13/2021

Parallel Deep Neural Networks Have Zero Duality Gap

Training deep neural networks is a well-known highly non-convex problem....
research
10/23/2017

Progressive Learning for Systematic Design of Large Neural Networks

We develop an algorithm for systematic design of a large artificial neur...

Please sign up or login with your details

Forgot password? Click here to reset