How degenerate is the parametrization of neural networks with the ReLU activation function?

05/23/2019
by   Julius Berner, et al.
0

Neural network training is usually accomplished by solving a non-convex optimization problem using stochastic gradient descent. Although one optimizes over the networks parameters, the loss function generally only depends on the realization of a neural network, i.e. the function it computes. Studying the functional optimization problem over the space of realizations can open up completely new ways to understand neural network training. In particular, usual loss functions like the mean squared error are convex on sets of neural network realizations, which themselves are non-convex. Note, however, that each realization has many different, possibly degenerate, parametrizations. In particular, a local minimum in the parametrization space needs not correspond to a local minimum in the realization space. To establish such a connection, inverse stability of the realization map is required, meaning that proximity of realizations must imply proximity of corresponding parametrizations. In this paper we present pathologies which prevent inverse stability in general, and proceed to establish a restricted set of parametrizations on which we have inverse stability w.r.t. to a Sobolev norm. Furthermore, we show that by optimizing over such restricted sets, it is still possible to learn any function, which can be learned by optimization over unrestricted sets. While most of this paper focuses on shallow networks, none of methods used are, in principle, limited to shallow networks, and it should be possible to extend them to deep neural networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2020

A Dynamical View on Optimization Algorithms of Overparameterized Neural Networks

When equipped with efficient optimization algorithms, the over-parameter...
research
05/31/2021

Combining resampling and reweighting for faithful stochastic optimization

Many machine learning and data science tasks require solving non-convex ...
research
03/31/2021

CDiNN -Convex Difference Neural Networks

Neural networks with ReLU activation function have been shown to be univ...
research
12/24/2020

Vector-output ReLU Neural Network Problems are Copositive Programs: Convex Analysis of Two Layer Networks and Polynomial-time Algorithms

We describe the convex semi-infinite dual of the two-layer vector-output...
research
02/03/2020

A Relaxation Argument for Optimization in Neural Networks and Non-Convex Compressed Sensing

It has been observed in practical applications and in theoretical analys...
research
06/18/2019

Dataless training of generative models for the inverse design of metasurfaces

Metasurfaces are subwavelength-structured artificial media that can shap...
research
07/19/2021

Non-asymptotic estimates for TUSLA algorithm for non-convex learning with applications to neural networks with ReLU activation function

We consider non-convex stochastic optimization problems where the object...

Please sign up or login with your details

Forgot password? Click here to reset