Benefits of Jointly Training Autoencoders: An Improved Neural Tangent Kernel Analysis

11/27/2019
by   Thanh V. Nguyen, et al.
0

A remarkable recent discovery in machine learning has been that deep neural networks can achieve impressive performance (in terms of both lower training error and higher generalization capacity) in the regime where they are massively over-parameterized. Consequently, over the last several months, the community has devoted growing interest in analyzing optimization and generalization properties of over-parameterized networks, and several breakthrough works have led to important theoretical progress. However, the majority of existing work only applies to supervised learning scenarios and hence are limited to settings such as classification and regression. In contrast, the role of over-parameterization in the unsupervised setting has gained far less attention. In this paper, we study the gradient dynamics of two-layer over-parameterized autoencoders with ReLU activation. We make very few assumptions about the given training dataset (other than mild non-degeneracy conditions). Starting from a randomly initialized autoencoder network, we rigorously prove the linear convergence of gradient descent in two learning regimes, namely: (i) the weakly-trained regime where only the encoder is trained, and (ii) the jointly-trained regime where both the encoder and the decoder are trained. Our results indicate the considerable benefits of joint training over weak training for finding global optima, achieving a dramatic decrease in the required level of over-parameterization. We also analyze the case of weight-tied autoencoders (which is a commonly used architectural choice in practical settings) and prove that in the over-parameterized setting, training such networks from randomly initialized points leads to certain unexpected degeneracies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/27/2019

How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks?

A recent line of research on deep learning focuses on the extremely over...
research
07/05/2021

Provable Convergence of Nesterov Accelerated Method for Over-Parameterized Neural Networks

Despite the empirical success of deep learning, it still lacks theoretic...
research
10/06/2018

Over-parameterization Improves Generalization in the XOR Detection Problem

Empirical evidence suggests that neural networks with ReLU activations g...
research
05/29/2019

On the Inductive Bias of Neural Tangent Kernels

State-of-the-art neural networks are heavily over-parameterized, making ...
research
06/02/2018

Autoencoders Learn Generative Linear Models

Recent progress in learning theory has led to the emergence of provable ...
research
06/05/2022

Early Stage Convergence and Global Convergence of Training Mildly Parameterized Neural Networks

The convergence of GD and SGD when training mildly parameterized neural ...
research
09/18/2018

On the Learning Dynamics of Deep Neural Networks

While a lot of progress has been made in recent years, the dynamics of l...

Please sign up or login with your details

Forgot password? Click here to reset