The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks

06/20/2023
by   Yuan Cao, et al.
2

We study the implicit bias of batch normalization trained by gradient descent. We show that when learning a linear model with batch normalization for binary classification, gradient descent converges to a uniform margin classifier on the training data with an exp(-Ω(log^2 t)) convergence rate. This distinguishes linear models with batch normalization from those without batch normalization in terms of both the type of implicit bias and the convergence rate. We further extend our result to a class of two-layer, single-filter linear convolutional neural networks, and show that batch normalization has an implicit bias towards a patch-wise uniform margin. Based on two examples, we demonstrate that patch-wise uniform margin classifiers can outperform the maximum margin classifiers in certain learning problems. Our results contribute to a better theoretical understanding of batch normalization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2019

A refined primal-dual analysis of the implicit bias

Recent work shows that gradient descent on linearly separable data is im...
research
03/07/2023

On the Implicit Bias of Linear Equivariant Steerable Networks: Margin, Generalization, and Their Equivalence to Data Augmentation

We study the implicit bias of gradient flow on linear equivariant steera...
research
11/29/2018

On Implicit Filter Level Sparsity in Convolutional Neural Networks

We investigate filter level sparsity that emerges in convolutional neura...
research
12/21/2018

Feature-Wise Bias Amplification

We study the phenomenon of bias amplification in classifiers, wherein a ...
research
09/29/2018

On the Convergence and Robustness of Batch Normalization

Despite its empirical success, the theoretical underpinnings of the stab...
research
11/12/2020

Implicit bias of any algorithm: bounding bias via margin

Consider n points x_1,…,x_n in finite-dimensional euclidean space, each ...
research
02/27/2019

Regularity Normalization: Constraining Implicit Space with Minimum Description Length

Inspired by the adaptation phenomenon of biological neuronal firing rate...

Please sign up or login with your details

Forgot password? Click here to reset