Global Convergence of Gradient Descent for Deep Linear Residual Networks

11/02/2019 ∙ by Lei Wu, et al. ∙ 0

We analyze the global convergence of gradient descent for deep linear residual networks by proposing a new initialization: zero-asymmetric (ZAS) initialization. It is motivated by avoiding stable manifolds of saddle points. We prove that under the ZAS initialization, for an arbitrary target matrix, gradient descent converges to an ε-optimal point in O(L^3 log(1/ε)) iterations, which scales polynomially with the network depth L. Our result and the (Ω(L)) convergence time for the standard initialization (Xavier or near-identity) [Shamir, 2018] together demonstrate the importance of the residual structure and the initialization in the optimization for deep linear neural networks, especially when L is large.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.