Batch Normalization Is Blind to the First and Second Derivatives of the Loss

05/30/2022
by   Zhanpeng Zhou, et al.
19

In this paper, we prove the effects of the BN operation on the back-propagation of the first and second derivatives of the loss. When we do the Taylor series expansion of the loss function, we prove that the BN operation will block the influence of the first-order term and most influence of the second-order term of the loss. We also find that such a problem is caused by the standardization phase of the BN operation. Experimental results have verified our theoretical conclusions, and we have found that the BN operation significantly affects feature representations in specific tasks, where losses of different samples share similar analytic formulas.

READ FULL TEXT

page 1

page 2

page 3

page 4

12/15/2020

Proofs and additional experiments on Second order techniques for learning time-series with structural breaks

We provide complete proofs of the lemmas about the properties of the reg...
09/21/2021

The influence of deflections on the static and dynamic behaviour of masonry columns

This paper studies the influence of bending deflections on the structura...
06/01/2021

Higher-order Derivatives of Weighted Finite-state Machines

Weighted finite-state machines are a fundamental building block of NLP s...
03/01/2022

Details of Second-Order Partial Derivatives of Rigid-Body Inverse Dynamics

This document provides full details of second-order partial derivatives ...
03/28/2018

Normalization of Neural Networks using Analytic Variance Propagation

We address the problem of estimating statistics of hidden units in a neu...
12/30/2021

Studying the Interplay between Information Loss and Operation Loss in Representations for Classification

Information-theoretic measures have been widely adopted in the design of...
01/01/2022

On automatic differentiation for the Matérn covariance

To target challenges in differentiable optimization we analyze and propo...