All Local Minima are Global for Two-Layer ReLU Neural Networks: The Hidden Convex Optimization Landscape

06/10/2020
by   Jonathan Lacotte, et al.
12

We are interested in two-layer ReLU neural networks from an optimization perspective. We prove that the path-connected sublevel set, i.e., valleys, of a neural network which is Clarke stationary with respect to the training loss with weight decay regularization contains a specific, simpler and more structured neural network, which we call its minimal representation. We provide an explicit construction of a continuous path between the neural network and its minimal counterpart. Importantly, we show that characterizing the optimality properties of a neural network can be reduced to characterizing those of its minimal representation. Thanks to the specific structure of minimal neural networks, we show that we can embed them into a convex optimization landscape. Leveraging convexity, we are able to (i) characterize the minimal size of the hidden layer so that the neural network optimization landscape has no spurious valleys and (ii) provide a polynomial-time algorithm for checking if a neural network is a global minimum of the training loss. Overall, we provide a rich framework for studying the landscape of the neural network training loss through our embedding to a convex optimization landscape.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/19/2021

Landscape analysis for shallow ReLU neural networks: complete classification of critical points for affine target functions

In this paper, we analyze the landscape of the true loss of a ReLU neura...
research
10/05/2017

Porcupine Neural Networks: (Almost) All Local Optima are Global

Neural networks have been used prominently in several machine learning a...
research
01/06/2022

Efficient Global Optimization of Two-layer ReLU Networks: Quadratic-time Algorithms and Adversarial Training

The non-convexity of the artificial neural network (ANN) training landsc...
research
04/24/2021

Achieving Small Test Error in Mildly Overparameterized Neural Networks

Recent theoretical works on over-parameterized neural nets have focused ...
research
03/31/2023

On the Effect of Initialization: The Scaling Path of 2-Layer Neural Networks

In supervised learning, the regularization path is sometimes used as a c...
research
06/21/2017

The energy landscape of a simple neural network

We explore the energy landscape of a simple neural network. In particula...
research
09/16/2020

Landscape of Sparse Linear Network: A Brief Investigation

Network pruning, or sparse network has a long history and practical sign...

Please sign up or login with your details

Forgot password? Click here to reset