Learning One-hidden-layer Neural Networks under General Input Distributions

10/09/2018
by   Weihao Gao, et al.
6

Significant advances have been made recently on training neural networks, where the main challenge is in solving an optimization problem with abundant critical points. However, existing approaches to address this issue crucially rely on a restrictive assumption: the training data is drawn from a Gaussian distribution. In this paper, we provide a novel unified framework to design loss functions with desirable landscape properties for a wide range of general input distributions. On these loss functions, remarkably, stochastic gradient descent theoretically recovers the true parameters with global initializations and empirically outperforms the existing approaches. Our loss function design bridges the notion of score functions with the topic of neural network optimization. Central to our approach is the task of estimating the score function from samples, which is of basic and independent interest to theoretical statistics. Traditional estimation methods (example: kernel based) fail right at the outset; we bring statistical methods of local likelihood to design a novel estimator of score functions, that provably adapts to the local geometry of the unknown density.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2017

Critical Points of Neural Networks: Analytical Forms and Landscape Properties

Due to the success of deep learning to solving a variety of challenging ...
research
01/20/2019

Training Neural Networks with Local Error Signals

Supervised training of neural networks for classification is typically p...
research
05/31/2021

Combining resampling and reweighting for faithful stochastic optimization

Many machine learning and data science tasks require solving non-convex ...
research
10/18/2017

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks

The past decade has witnessed a successful application of deep learning ...
research
05/08/2020

The critical locus of overparameterized neural networks

Many aspects of the geometry of loss functions in deep learning remain m...
research
03/03/2018

On the Power of Over-parametrization in Neural Networks with Quadratic Activation

We provide new theoretical insights on why over-parametrization is effec...
research
06/23/2017

On Sampling Strategies for Neural Network-based Collaborative Filtering

Recent advances in neural networks have inspired people to design hybrid...

Please sign up or login with your details

Forgot password? Click here to reset