Convergence rates for pretraining and dropout: Guiding learning parameters using network structure

06/10/2015
by   Vamsi K. Ithapu, et al.
0

Unsupervised pretraining and dropout have been well studied, especially with respect to regularization and output consistency. However, our understanding about the explicit convergence rates of the parameter estimates, and their dependence on the learning (like denoising and dropout rate) and structural (like depth and layer lengths) aspects of the network is less mature. An interesting question in this context is to ask if the network structure could "guide" the choices of such learning parameters. In this work, we explore these gaps between network structure, the learning mechanisms and their interaction with parameter convergence rates. We present a way to address these issues based on the backpropagation convergence rates for general nonconvex objectives using first-order information. We then incorporate two learning mechanisms into this general framework -- denoising autoencoder and dropout, and subsequently derive the convergence rates of deep networks. Building upon these bounds, we provide insights into the choices of learning parameters and network sizes that achieve certain levels of convergence accuracy. The results derived here support existing empirical observations, and we also conduct a set of experiments to evaluate them.

READ FULL TEXT
research
11/17/2015

On the interplay of network structure and gradient convergence in deep learning

The regularization and output consistency behavior of dropout and layer-...
research
02/28/2017

On architectural choices in deep learning: From network structure to gradient convergence and parameter estimation

We study mechanisms to characterize how the asymptotic convergence of ba...
research
06/02/2020

On optimal convergence rates of spectral orthogonal projection approximation for functions of algbraic and logarithmatic regularities

Based on the Hilb type formula between Jacobi polynomials and Bessel fun...
research
06/02/2020

Convergence rates of spectral orthogonal projection approximation for functions of algebraic and logarithmatic regularities

Based on the Hilb type formula between Jacobi polynomials and Bessel fun...
research
10/10/2011

Convergence Rates for Mixture-of-Experts

In mixtures-of-experts (ME) model, where a number of submodels (experts)...
research
06/17/2021

A Short Note of PAGE: Optimal Convergence Rates for Nonconvex Optimization

In this note, we first recall the nonconvex problem setting and introduc...
research
01/29/2020

Network-Assisted Estimation for Large-dimensional Factor Model with Guaranteed Convergence Rate Improvement

Network structure is growing popular for capturing the intrinsic relatio...

Please sign up or login with your details

Forgot password? Click here to reset