Emergence of Invariance and Disentangling in Deep Representations

06/05/2017
by   Alessandro Achille, et al.
0

Using established principles from Information Theory and Statistics, we show that in a deep neural network invariance to nuisance factors is equivalent to information minimality of the learned representation, and that stacking layers and injecting noise during training naturally bias the network towards learning invariant representations. We then show that, in order to avoid memorization, we need to limit the quantity of information stored in the weights, which leads to a novel usage of the Information Bottleneck Lagrangian on the weights as a learning criterion. This also has an alternative interpretation as minimizing a PAC-Bayesian bound on the test error. Finally, we exploit a duality between weights and activations induced by the architecture, to show that the information in the weights bounds the minimality and Total Correlation of the layers, therefore showing that regularizing the weights explicitly or implicitly, using SGD, not only helps avoid overfitting, but also fosters invariance and disentangling of the learned representation. The theory also enables predicting sharp phase transitions between underfitting and overfitting random labels at precise information values, and sheds light on the relation between the geometry of the loss function, in particular so-called "flat minima," and generalization.

READ FULL TEXT

page 1

page 15

research
05/29/2019

Where is the Information in a Deep Neural Network?

Whatever information a Deep Neural Network has gleaned from past data is...
research
09/29/2021

PAC-Bayes Information Bottleneck

Information bottleneck (IB) depicts a trade-off between the accuracy and...
research
11/11/2019

Invariant Representations through Adversarial Forgetting

We propose a novel approach to achieving invariance for deep neural netw...
research
05/19/2023

Towards understanding neural collapse in supervised contrastive learning with the information bottleneck method

Neural collapse describes the geometry of activation in the final layer ...
research
05/15/2022

Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective

The lottery ticket hypothesis (LTH) has attracted attention because it c...
research
06/17/2021

Adaptive Low-Rank Regularization with Damping Sequences to Restrict Lazy Weights in Deep Networks

Overfitting is one of the critical problems in deep neural networks. Man...
research
10/19/2018

Exchangeability and Kernel Invariance in Trained MLPs

In the analysis of machine learning models, it is often convenient to as...

Please sign up or login with your details

Forgot password? Click here to reset