Data-driven effective model shows a liquid-like deep learning

07/16/2020
by   Wenxuan Zou, et al.
0

Geometric structure of an optimization landscape is argued to be fundamentally important to support the success of deep learning. However, recent research efforts focused on either of toy random models with unrealistic assumptions and numerical evidences about different shapes of the optimization landscape, thereby lacking a unified view about the nature of the landscape. Here, we propose a statistical mechanics framework by directly building a least structured model of the high-dimensional weight space, considering realistic structured data, stochastic gradient descent algorithms, and the computational depth of the network parametrized by weight parameters. We also consider whether the number of network parameters outnumbers the number of supplied training data, namely, over- or under-parametrization. Our least structured model predicts that the weight spaces of the under-parametrization and over-parameterization cases belong to the same class. These weight spaces are well-connected without any heterogeneous geometric properties. In contrast, the shallow-network has a shattered weight space, characterized by discontinuous phase transitions in physics, thereby clarifying roles of depth in deep learning. Our effective model also predicts that inside a deep network, there exists a liquid-like central part of the architecture in the sense that the weights in this part behave as randomly as possible. Our work may thus explain why deep learning is unreasonably effective in terms of the high-dimensional weight space, and how deep networks are different from shallow ones.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/05/2015

Symmetry-invariant optimization in deep networks

Recent works have highlighted scale invariance or symmetry that is prese...
research
08/25/2019

Theoretical Issues in Deep Networks: Approximation, Optimization and Generalization

While deep learning is successful in a number of applications, it is not...
research
11/19/2015

Universal halting times in optimization and machine learning

The authors present empirical distributions for the halting time (measur...
research
02/13/2020

Classifying the classifier: dissecting the weight space of neural networks

This paper presents an empirical study on the weights of neural networks...
research
11/03/2015

Understanding symmetries in deep networks

Recent works have highlighted scale invariance or symmetry present in th...
research
12/20/2020

Recent advances in deep learning theory

Deep learning is usually described as an experiment-driven field under c...

Please sign up or login with your details

Forgot password? Click here to reset