Information Flow in Deep Neural Networks

02/10/2022
by   Ravid Shwartz-Ziv, et al.
0

Although deep neural networks have been immensely successful, there is no comprehensive theoretical understanding of how they work or are structured. As a result, deep networks are often seen as black boxes with unclear interpretations and reliability. Understanding the performance of deep neural networks is one of the greatest scientific challenges. This work aims to apply principles and techniques from information theory to deep learning models to increase our theoretical understanding and design better algorithms. We first describe our information-theoretic approach to deep learning. Then, we propose using the Information Bottleneck (IB) theory to explain deep learning systems. The novel paradigm for analyzing networks sheds light on their layered structure, generalization abilities, and learning dynamics. We later discuss one of the most challenging problems of applying the IB to deep neural networks - estimating mutual information. Recent theoretical developments, such as the neural tangent kernel (NTK) framework, are used to investigate generalization signals. In our study, we obtained tractable computations of many information-theoretic quantities and their bounds for infinite ensembles of infinitely wide neural networks. With these derivations, we can determine how compression, generalization, and sample size pertain to the network and how they are related. At the end, we present the dual Information Bottleneck (dualIB). This new information-theoretic framework resolves some of the IB's shortcomings by merely switching terms in the distortion function. The dualIB can account for known data features and use them to make better predictions over unseen examples. The underlying structure and the optimal representations are uncovered through an analytical framework, and a variational framework using deep neural networks optimizes has been used.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/20/2019

Information in Infinite Ensembles of Infinitely-Wide Neural Networks

In this preliminary work, we study the generalization properties of infi...
research
05/24/2018

Entropy and mutual information in models of deep neural networks

We examine a class of deep learning models with a tractable method to co...
research
06/28/2023

On information captured by neural networks: connections with memorization and generalization

Despite the popularity and success of deep learning, there is limited un...
research
09/30/2022

Information Removal at the bottleneck in Deep Neural Networks

Deep learning models are nowadays broadly deployed to solve an incredibl...
research
02/20/2018

Do Deep Learning Models Have Too Many Parameters? An Information Theory Viewpoint

Deep learning models often have more parameters than observations, and s...
research
06/18/2021

The Principles of Deep Learning Theory

This book develops an effective theory approach to understanding deep ne...
research
03/27/2020

Unpacking Information Bottlenecks: Unifying Information-Theoretic Objectives in Deep Learning

The information bottleneck (IB) principle offers both a mechanism to exp...

Please sign up or login with your details

Forgot password? Click here to reset