Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations

12/26/2017
by   Yuanzhi Li, et al.
0

We show that the (stochastic) gradient descent algorithm provides an implicit regularization effect in the learning of over-parameterized matrix factorization models and one-hidden-layer neural networks with quadratic activations. Concretely, we show that given Õ(dr^2) random linear measurements of a rank r positive semidefinite matrix X^, we can recover X^ by parameterizing it by UU^ with U∈R^d× d and minimizing the squared loss, even if r ≪ d. We prove that starting from a small initialization, gradient descent recovers X^ in Õ(√(r)) iterations approximately. The results solve the conjecture of Gunasekar et al.'17 under the restricted isometry property. The technique can be applied to analyzing neural networks with quadratic activations with some technical modifications.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

12/26/2017

Algorithmic Regularization in Over-parameterized Matrix Recovery

We study the problem of recovering a low-rank matrix X^ from linear meas...
05/25/2017

Implicit Regularization in Matrix Factorization

We study implicit regularization when optimizing an underdetermined quad...
07/04/2019

Learning One-hidden-layer neural networks via Provable Gradient Descent with Random Initialization

Although deep learning has shown its powerful performance in many applic...
06/06/2018

Implicit regularization and solution uniqueness in over-parameterized matrix sensing

We consider whether algorithmic choices in over-parameterized linear mat...
01/14/2020

On the Convex Behavior of Deep Neural Networks in Relation to the Layers' Width

The Hessian of neural networks can be decomposed into a sum of two matri...
11/30/2021

The Geometric Occam's Razor Implicit in Deep Learning

In over-parameterized deep neural networks there can be many possible pa...
02/16/2018

Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks

We analyze algorithms for approximating a function f(x) = Φ x mapping ^d...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.