On the interplay of network structure and gradient convergence in deep learning

11/17/2015
by   Vamsi K. Ithapu, et al.
0

The regularization and output consistency behavior of dropout and layer-wise pretraining for learning deep networks have been fairly well studied. However, our understanding of how the asymptotic convergence of backpropagation in deep architectures is related to the structural properties of the network and other design choices (like denoising and dropout rate) is less clear at this time. An interesting question one may ask is whether the network architecture and input data statistics may guide the choices of learning parameters and vice versa. In this work, we explore the association between such structural, distributional and learnability aspects vis-à-vis their interaction with parameter convergence rates. We present a framework to address these questions based on convergence of backpropagation for general nonconvex objectives using first-order information. This analysis suggests an interesting relationship between feature denoising and dropout. Building upon these results, we obtain a setup that provides systematic guidance regarding the choice of learning parameters and network sizes that achieve a certain level of convergence (in the optimization sense) often mediated by statistical attributes of the inputs. Our results are supported by a set of experimental evaluations as well as independent empirical observations reported by other groups.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2017

On architectural choices in deep learning: From network structure to gradient convergence and parameter estimation

We study mechanisms to characterize how the asymptotic convergence of ba...
research
06/10/2015

Convergence rates for pretraining and dropout: Guiding learning parameters using network structure

Unsupervised pretraining and dropout have been well studied, especially ...
research
02/14/2016

Surprising properties of dropout in deep networks

We analyze dropout in deep networks with rectified linear units and the ...
research
12/19/2019

Mean field theory for deep dropout networks: digging up gradient backpropagation deeply

In recent years, the mean field theory has been applied to the study of ...
research
12/01/2020

Asymptotic convergence rate of Dropout on shallow linear neural networks

We analyze the convergence rate of gradient flows on objective functions...
research
02/12/2015

Convergence of gradient based pre-training in Denoising autoencoders

The success of deep architectures is at least in part attributed to the ...
research
05/02/2022

Triangular Dropout: Variable Network Width without Retraining

One of the most fundamental design choices in neural networks is layer w...

Please sign up or login with your details

Forgot password? Click here to reset