DeepAI AI Chat
Log In Sign Up

On the interplay of network structure and gradient convergence in deep learning

by   Vamsi K. Ithapu, et al.

The regularization and output consistency behavior of dropout and layer-wise pretraining for learning deep networks have been fairly well studied. However, our understanding of how the asymptotic convergence of backpropagation in deep architectures is related to the structural properties of the network and other design choices (like denoising and dropout rate) is less clear at this time. An interesting question one may ask is whether the network architecture and input data statistics may guide the choices of learning parameters and vice versa. In this work, we explore the association between such structural, distributional and learnability aspects vis-à-vis their interaction with parameter convergence rates. We present a framework to address these questions based on convergence of backpropagation for general nonconvex objectives using first-order information. This analysis suggests an interesting relationship between feature denoising and dropout. Building upon these results, we obtain a setup that provides systematic guidance regarding the choice of learning parameters and network sizes that achieve a certain level of convergence (in the optimization sense) often mediated by statistical attributes of the inputs. Our results are supported by a set of experimental evaluations as well as independent empirical observations reported by other groups.


page 1

page 2

page 3

page 4


Convergence rates for pretraining and dropout: Guiding learning parameters using network structure

Unsupervised pretraining and dropout have been well studied, especially ...

Surprising properties of dropout in deep networks

We analyze dropout in deep networks with rectified linear units and the ...

Mean field theory for deep dropout networks: digging up gradient backpropagation deeply

In recent years, the mean field theory has been applied to the study of ...

Asymptotic convergence rate of Dropout on shallow linear neural networks

We analyze the convergence rate of gradient flows on objective functions...

Convergence of gradient based pre-training in Denoising autoencoders

The success of deep architectures is at least in part attributed to the ...

Triangular Dropout: Variable Network Width without Retraining

One of the most fundamental design choices in neural networks is layer w...