Analysis of feature learning in weight-tied autoencoders via the mean field lens

02/16/2021
by   Phan-Minh Nguyen, et al.
0

Autoencoders are among the earliest introduced nonlinear models for unsupervised learning. Although they are widely adopted beyond research, it has been a longstanding open problem to understand mathematically the feature extraction mechanism that trained nonlinear autoencoders provide. In this work, we make progress in this problem by analyzing a class of two-layer weight-tied nonlinear autoencoders in the mean field framework. Upon a suitable scaling, in the regime of a large number of neurons, the models trained with stochastic gradient descent are shown to admit a mean field limiting dynamics. This limiting description reveals an asymptotically precise picture of feature learning by these models: their training dynamics exhibit different phases that correspond to the learning of different principal subspaces of the data, with varying degrees of nonlinear shrinkage dependent on the ℓ_2-regularization and stopping time. While we prove these results under an idealized assumption of (correlated) Gaussian data, experiments on real-life data demonstrate an interesting match with the theory. The autoencoder setup of interests poses a nontrivial mathematical challenge to proving these results. In this setup, the "Lipschitz" constants of the models grow with the data dimension d. Consequently an adaptation of previous analyses requires a number of neurons N that is at least exponential in d. Our main technical contribution is a new argument which proves that the required N is only polynomial in d. We conjecture that N≫ d is sufficient and that N is necessarily larger than a data-dependent intrinsic dimension, a behavior that is fundamentally different from previously studied setups.

READ FULL TEXT

page 22

page 24

research
03/12/2023

Global Optimality of Elman-type RNN in the Mean-Field Regime

We analyze Elman-type Recurrent Reural Networks (RNNs) and their trainin...
research
02/12/2023

From high-dimensional mean-field dynamics to dimensionless ODEs: A unifying approach to SGD in two-layers networks

This manuscript investigates the one-pass stochastic gradient descent (S...
research
01/06/2022

The dynamics of representation learning in shallow, non-linear autoencoders

Autoencoders are the simplest neural network for unsupervised learning, ...
research
02/07/2019

Mean Field Limit of the Learning Dynamics of Multilayer Neural Networks

Can multilayer neural networks -- typically constructed as highly comple...
research
06/25/2020

The Quenching-Activation Behavior of the Gradient Descent Dynamics for Two-layer Neural Network Models

A numerical and phenomenological study of the gradient descent (GD) algo...
research
07/13/2020

Quantitative Propagation of Chaos for SGD in Wide Neural Networks

In this paper, we investigate the limiting behavior of a continuous-time...
research
02/05/2020

A mean-field theory of lazy training in two-layer neural nets: entropic regularization and controlled McKean-Vlasov dynamics

We consider the problem of universal approximation of functions by two-l...

Please sign up or login with your details

Forgot password? Click here to reset