Making Coherence Out of Nothing At All: Measuring the Evolution of Gradient Alignment

08/03/2020
by   Satrajit Chatterjee, et al.
0

We propose a new metric (m-coherence) to experimentally study the alignment of per-example gradients during training. Intuitively, given a sample of size m, m-coherence is the number of examples in the sample that benefit from a small step along the gradient of any one example on average. We show that compared to other commonly used metrics, m-coherence is more interpretable, cheaper to compute (O(m) instead of O(m^2)) and mathematically cleaner. (We note that m-coherence is closely connected to gradient diversity, a quantity previously used in some theoretical bounds.) Using m-coherence, we study the evolution of alignment of per-example gradients in ResNet and Inception models on ImageNet and several variants with label noise, particularly from the perspective of the recently proposed Coherent Gradients (CG) theory that provides a simple, unified explanation for memorization and generalization [Chatterjee, ICLR 20]. Although we have several interesting takeaways, our most surprising result concerns memorization. Naively, one might expect that when training with completely random labels, each example is fitted independently, and so m-coherence should be close to 1. However, this is not the case: m-coherence reaches much higher values during training (100s), indicating that over-parameterized neural networks find common patterns even in scenarios where generalization is not possible. A detailed analysis of this phenomenon provides both a deeper confirmation of CG, but at the same point puts into sharp relief what is missing from the theory in order to provide a complete explanation of generalization in neural networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/18/2022

On the Generalization Mystery in Deep Learning

The generalization mystery in deep learning is the following: Why do ove...
research
03/16/2020

Explaining Memorization and Generalization: A Large-Scale Study with Coherent Gradients

Coherent Gradients is a recently proposed hypothesis to explain why over...
research
10/25/2022

The Curious Case of Benign Memorization

Despite the empirical advances of deep learning across a variety of lear...
research
06/16/2020

Gradient Alignment in Deep Neural Networks

One cornerstone of interpretable deep learning is the high degree of vis...
research
12/15/2017

Gradients explode - Deep Networks are shallow - ResNet explained

Whereas it is believed that techniques such as Adam, batch normalization...
research
08/09/2020

What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation

Deep learning algorithms are well-known to have a propensity for fitting...
research
01/12/2023

Deep learning enhanced noise spectroscopy of a spin qubit environment

The undesired interaction of a quantum system with its environment gener...

Please sign up or login with your details

Forgot password? Click here to reset