On architectural choices in deep learning: From network structure to gradient convergence and parameter estimation

02/28/2017
by   Vamsi K. Ithapu, et al.
0

We study mechanisms to characterize how the asymptotic convergence of backpropagation in deep architectures, in general, is related to the network structure, and how it may be influenced by other design choices including activation type, denoising and dropout rate. We seek to analyze whether network architecture and input data statistics may guide the choices of learning parameters and vice versa. Given the broad applicability of deep architectures, this issue is interesting both from theoretical and a practical standpoint. Using properties of general nonconvex objectives (with first-order information), we first build the association between structural, distributional and learnability aspects of the network vis-à-vis their interaction with parameter convergence rates. We identify a nice relationship between feature denoising and dropout, and construct families of networks that achieve the same level of convergence. We then derive a workflow that provides systematic guidance regarding the choice of network sizes and learning parameters often mediated4 by input statistics. Our technical results are corroborated by an extensive set of evaluations, presented in this paper as well as independent empirical observations reported by other groups. We also perform experiments showing the practical implications of our framework for choosing the best fully-connected design for a given problem.

READ FULL TEXT
research
11/17/2015

On the interplay of network structure and gradient convergence in deep learning

The regularization and output consistency behavior of dropout and layer-...
research
06/10/2015

Convergence rates for pretraining and dropout: Guiding learning parameters using network structure

Unsupervised pretraining and dropout have been well studied, especially ...
research
01/29/2020

Network-Assisted Estimation for Large-dimensional Factor Model with Guaranteed Convergence Rate Improvement

Network structure is growing popular for capturing the intrinsic relatio...
research
05/24/2018

Autonomously and Simultaneously Refining Deep Neural Network Parameters by Generative Adversarial Networks

The choice of parameters, and the design of the network architecture are...
research
04/22/2023

The Devil is in the Upsampling: Architectural Decisions Made Simpler for Denoising with Deep Image Prior

Deep Image Prior (DIP) shows that some network architectures naturally b...
research
05/06/2022

Network Gradient Descent Algorithm for Decentralized Federated Learning

We study a fully decentralized federated learning algorithm, which is a ...

Please sign up or login with your details

Forgot password? Click here to reset