Chaining Meets Chain Rule: Multilevel Entropic Regularization and Training of Neural Nets

06/26/2019
by   Amir R. Asadi, et al.
0

We derive generalization and excess risk bounds for neural nets using a family of complexity measures based on a multilevel relative entropy. The bounds are obtained by introducing the notion of generated hierarchical coverings of neural nets and by using the technique of chaining mutual information introduced in Asadi et al. NeurIPS'18. The resulting bounds are algorithm-dependent and exploit the multilevel structure of neural nets. This, in turn, leads to an empirical risk minimization problem with a multilevel entropic regularization. The minimization problem is resolved by introducing a multi-scale generalization of the celebrated Gibbs posterior distribution, proving that the derived distribution achieves the unique minimum. This leads to a new training procedure for neural nets with performance guarantees, which exploits the chain rule of relative entropy rather than the chain rule of derivatives (as in backpropagation). To obtain an efficient implementation of the latter, we further develop a multilevel Metropolis algorithm simulating the multi-scale Gibbs distribution, with an experiment for a two-layer neural net on the MNIST data set.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/25/2020

Maximum Multiscale Entropy and Neural Network Regularization

A well-known result across information theory, machine learning, and sta...
research
04/26/2022

PAC-Bayes training for neural networks: sparsity and uncertainty quantification

We study the Gibbs posterior distribution from PAC-Bayes theory for spar...
research
02/17/2021

Multilevel Monte Carlo learning

In this work, we study the approximation of expected values of functiona...
research
09/16/2021

Multilevel-Langevin pathwise average for Gibbs approximation

We propose and study a new multilevel method for the numerical approxima...
research
12/03/2007

Pac-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning

This monograph deals with adaptive supervised classification, using tool...
research
08/26/2015

Towards universal neural nets: Gibbs machines and ACE

We study from a physics viewpoint a class of generative neural nets, Gib...
research
06/12/2023

Analysis of the Relative Entropy Asymmetry in the Regularization of Empirical Risk Minimization

The effect of the relative entropy asymmetry is analyzed in the empirica...

Please sign up or login with your details

Forgot password? Click here to reset