On information captured by neural networks: connections with memorization and generalization

06/28/2023
by   Hrayr Harutyunyan, et al.
0

Despite the popularity and success of deep learning, there is limited understanding of when, how, and why neural networks generalize to unseen examples. Since learning can be seen as extracting information from data, we formally study information captured by neural networks during training. Specifically, we start with viewing learning in presence of noisy labels from an information-theoretic perspective and derive a learning algorithm that limits label noise information in weights. We then define a notion of unique information that an individual sample provides to the training of a deep network, shedding some light on the behavior of neural networks on examples that are atypical, ambiguous, or belong to underrepresented subpopulations. We relate example informativeness to generalization by deriving nonvacuous generalization gap bounds. Finally, by studying knowledge distillation, we highlight the important role of data and label complexity in generalization. Overall, our findings contribute to a deeper understanding of the mechanisms underlying neural network generalization.

READ FULL TEXT

page 31

page 41

research
02/10/2022

Information Flow in Deep Neural Networks

Although deep neural networks have been immensely successful, there is n...
research
01/28/2022

With Greater Distance Comes Worse Performance: On the Perspective of Layer Utilization and Model Generalization

Generalization of deep neural networks remains one of the main open prob...
research
05/30/2018

Collaborative Learning for Deep Neural Networks

We introduce collaborative learning in which multiple classifier heads o...
research
07/27/2021

Pointer Value Retrieval: A new benchmark for understanding the limits of neural network generalization

The successes of deep learning critically rely on the ability of neural ...
research
10/17/2022

Measures of Information Reflect Memorization Patterns

Neural networks are known to exploit spurious artifacts (or shortcuts) t...
research
02/05/2021

Learning While Dissipating Information: Understanding the Generalization Capability of SGLD

Understanding the generalization capability of learning algorithms is at...
research
03/28/2022

Knowledge Distillation: Bad Models Can Be Good Role Models

Large neural networks trained in the overparameterized regime are able t...

Please sign up or login with your details

Forgot password? Click here to reset