Measure, Manifold, Learning, and Optimization: A Theory Of Neural Networks

11/30/2018
by   Shuai Li, et al.
6

We present a formal measure-theoretical theory of neural networks (NN) built on probability coupling theory. Our main contributions are summarized as follows. * Built on the formalism of probability coupling theory, we derive an algorithm framework, named Hierarchical Measure Group and Approximate System (HMGAS), nicknamed S-System, that is designed to learn the complex hierarchical, statistical dependency in the physical world. * We show that NNs are special cases of S-System when the probability kernels assume certain exponential family distributions. Activation Functions are derived formally. We further endow geometry on NNs through information geometry, show that intermediate feature spaces of NNs are stochastic manifolds, and prove that "distance" between samples is contracted as layers stack up. * S-System shows NNs are inherently stochastic, and under a set of realistic boundedness and diversity conditions, it enables us to prove that for large size nonlinear deep NNs with a class of losses, including the hinge loss, all local minima are global minima with zero loss errors, and regions around the minima are flat basins where all eigenvalues of Hessians are concentrated around zero, using tools and ideas from mean field theory, random matrix theory, and nonlinear operator equations. * S-System, the information-geometry structure and the optimization behaviors combined completes the analog between Renormalization Group (RG) and NNs. It shows that a NN is a complex adaptive system that estimates the statistic dependency of microscopic object, e.g., pixels, in multiple scales. Unlike clear-cut physical quantity produced by RG in physics, e.g., temperature, NNs renormalize/recompose manifolds emerging through learning/optimization that divide the sample space into highly semantically meaningful groups that are dictated by supervised labels (in supervised NNs).

READ FULL TEXT

page 3

page 7

research
02/10/2018

A Critical View of Global Optimality in Deep Learning

We investigate the loss surface of deep linear and nonlinear neural netw...
research
06/16/2016

Exponential expressivity in deep neural networks through transient chaos

We combine Riemannian geometry with the mean field theory of high dimens...
research
06/04/2018

Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach

This study analyzes the Fisher information matrix (FIM) by applying mean...
research
10/17/2018

The loss surface of deep linear networks viewed through the algebraic geometry lens

By using the viewpoint of modern computational algebraic geometry, we ex...
research
07/03/2023

Neural Hilbert Ladders: Multi-Layer Neural Networks in Function Space

The characterization of the functions spaces explored by neural networks...
research
03/30/2020

Stochastic Flows and Geometric Optimization on the Orthogonal Group

We present a new class of stochastic, geometrically-driven optimization ...
research
02/24/2022

Entropic trust region for densest crystallographic symmetry group packings

Molecular crystal structure prediction (CSP) seeks the most stable perio...

Please sign up or login with your details

Forgot password? Click here to reset