Learning One-hidden-layer Neural Networks with Landscape Design

11/01/2017
by   Rong Ge, et al.
0

We consider the problem of learning a one-hidden-layer neural network: we assume the input x∈R^d is from Gaussian distribution and the label y = a^σ(Bx) + ξ, where a is a nonnegative vector in R^m with m< d, B∈R^m× d is a full-rank weight matrix, and ξ is a noise vector. We first give an analytic formula for the population risk of the standard squared loss and demonstrate that it implicitly attempts to decompose a sequence of low-rank tensors simultaneously. Inspired by the formula, we design a non-convex objective function G(·) whose landscape is guaranteed to have the following properties: 1. All local minima of G are also global minima. 2. All global minima of G correspond to the ground truth parameters. 3. The value and gradient of G can be estimated using samples. With these properties, stochastic gradient descent on G provably converges to the global minimum and learn the ground-truth parameters. We also prove finite sample complexity result and validate the results by simulations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2020

Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK

We consider the dynamic of gradient descent for learning a two-layer neu...
research
08/17/2020

Learning Two-Layer Residual Networks with Nonparametric Function Estimation by Convex Programming

We focus on learning a two-layer residual neural network with preactivat...
research
02/18/2018

Local Geometry of One-Hidden-Layer Neural Networks for Logistic Regression

We study the local geometry of a one-hidden-layer fully-connected neural...
research
06/12/2020

How Many Samples is a Good Initial Point Worth?

Given a sufficiently large amount of labeled data, the non-convex low-ra...
research
07/07/2022

Learning and generalization of one-hidden-layer neural networks, going beyond standard Gaussian data

This paper analyzes the convergence and generalization of training a one...
research
09/01/2023

Structure and Gradient Dynamics Near Global Minima of Two-layer Neural Networks

Under mild assumptions, we investigate the structure of loss landscape o...
research
07/15/2022

Blessing of Nonconvexity in Deep Linear Models: Depth Flattens the Optimization Landscape Around the True Solution

This work characterizes the effect of depth on the optimization landscap...

Please sign up or login with your details

Forgot password? Click here to reset