
A Geometric Approach of Gradient Descent Algorithms in Neural Networks
In this article we present a geometric framework to analyze convergence ...
11/08/2018 ∙ by Yacine Chitour, et al. ∙ 0 ∙ shareread it

Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks
Natural gradient descent has proven effective at mitigating the effects ...
05/27/2019 ∙ by Guodong Zhang, et al. ∙ 35 ∙ shareread it

On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization
Conventional wisdom in deep learning states that increasing depth improv...
02/19/2018 ∙ by Sanjeev Arora, et al. ∙ 0 ∙ shareread it

DistributionSpecific Hardness of Learning Neural Networks
Although neural networks are routinely and successfully trained in pract...
09/05/2016 ∙ by Ohad Shamir, et al. ∙ 0 ∙ shareread it

Provable Methods for Training Neural Networks with Sparse Connectivity
We provide novel guaranteed approaches for training feedforward neural n...
12/08/2014 ∙ by Hanie Sedghi, et al. ∙ 0 ∙ shareread it

Optimal Statistical Rates for Decentralised NonParametric Regression with Linear SpeedUp
We analyse the learning performance of Distributed Gradient Descent in t...
05/08/2019 ∙ by Dominic Richards, et al. ∙ 0 ∙ shareread it

Step Size Matters in Deep Learning
Training a neural network with the gradient descent algorithm gives rise...
05/22/2018 ∙ by Kamil Nar, et al. ∙ 0 ∙ shareread it
Exponential Convergence Time of Gradient Descent for OneDimensional Deep Linear Neural Networks
In this note, we study the dynamics of gradient descent on objective functions of the form f(∏_i=1^k w_i) (with respect to scalar parameters w_1,...,w_k), which arise in the context of training depthk linear neural networks. We prove that for standard random initializations, and under mild assumptions on f, the number of iterations required for convergence scales exponentially with the depth k. This highlights a potential obstacle in understanding the convergence of gradientbased methods for deep linear neural networks, where k is large.
READ FULL TEXT