A Relaxation Argument for Optimization in Neural Networks and Non-Convex Compressed Sensing

02/03/2020
by   G. Welper, et al.
0

It has been observed in practical applications and in theoretical analysis that over-parametrization helps to find good minima in neural network training. Similarly, in this article we study widening and deepening neural networks by a relaxation argument so that the enlarged networks are rich enough to run r copies of parts of the original network in parallel, without necessarily achieving zero training error as in over-parametrized scenarios. The partial copies can be combined in r^θ possible ways for layer width θ. Therefore, the enlarged networks can potentially achieve the best training error of r^θ random initializations, but it is not immediately clear if this can be realized via gradient descent or similar training methods. The same construction can be applied to other optimization problems by introducing a similar layered structure. We apply this idea to non-convex compressed sensing, where we show that in some scenarios we can realize the r^θ times increased chance to obtain a global optimum by solving a convex optimization problem of dimension rθ.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/19/2014

Qualitatively characterizing neural network optimization problems

Training neural networks involves solving large-scale non-convex optimiz...
research
08/17/2016

Mollifying Networks

The optimization of deep neural networks can be more challenging than tr...
research
02/16/2018

Orthogonality-Promoting Distance Metric Learning: Convex Relaxation and Theoretical Analysis

Distance metric learning (DML), which learns a distance metric from labe...
research
06/28/2023

Beyond NTK with Vanilla Gradient Descent: A Mean-Field Analysis of Neural Networks with Polynomial Width, Samples, and Time

Despite recent theoretical progress on the non-convex optimization of tw...
research
01/20/2021

Non-Convex Compressed Sensing with Training Data

Efficient algorithms for the sparse solution of under-determined linear ...
research
05/23/2019

How degenerate is the parametrization of neural networks with the ReLU activation function?

Neural network training is usually accomplished by solving a non-convex ...
research
06/09/2019

Stochastic In-Face Frank-Wolfe Methods for Non-Convex Optimization and Sparse Neural Network Training

The Frank-Wolfe method and its extensions are well-suited for delivering...

Please sign up or login with your details

Forgot password? Click here to reset