Convergence of gradient based pre-training in Denoising autoencoders

02/12/2015
by   Vamsi K. Ithapu, et al.
0

The success of deep architectures is at least in part attributed to the layer-by-layer unsupervised pre-training that initializes the network. Various papers have reported extensive empirical analysis focusing on the design and implementation of good pre-training procedures. However, an understanding pertaining to the consistency of parameter estimates, the convergence of learning procedures and the sample size estimates is still unavailable in the literature. In this work, we study pre-training in classical and distributed denoising autoencoders with these goals in mind. We show that the gradient converges at the rate of 1/√(N) and has a sub-linear dependence on the size of the autoencoder network. In a distributed setting where disjoint sections of the whole network are pre-trained synchronously, we show that the convergence improves by at least τ^3/4, where τ corresponds to the size of the sections. We provide a broad set of experiments to empirically evaluate the suggested behavior.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2022

Pre-training via Denoising for Molecular Property Prediction

Many important problems involving molecular property prediction from 3D ...
research
08/22/2019

Denoising based Sequence-to-Sequence Pre-training for Text Generation

This paper presents a new sequence-to-sequence (seq2seq) pre-training me...
research
12/14/2018

On Stacked Denoising Autoencoder based Pre-training of ANN for Isolated Handwritten Bengali Numerals Dataset Recognition

This work attempts to find the most optimal parameter setting of a deep ...
research
03/22/2022

A Broad Study of Pre-training for Domain Generalization and Adaptation

Deep models must learn robust and transferable representations in order ...
research
05/05/2011

Rapid Feature Learning with Stacked Linear Denoisers

We investigate unsupervised pre-training of deep architectures as featur...
research
04/06/2023

Diffusion Models as Masked Autoencoders

There has been a longstanding belief that generation can facilitate a tr...
research
11/17/2015

On the interplay of network structure and gradient convergence in deep learning

The regularization and output consistency behavior of dropout and layer-...

Please sign up or login with your details

Forgot password? Click here to reset