Piecewise convexity of artificial neural networks

07/17/2016
by   Blaine Rister, et al.
0

Although artificial neural networks have shown great promise in applications including computer vision and speech recognition, there remains considerable practical and theoretical difficulty in optimizing their parameters. The seemingly unreasonable success of gradient descent methods in minimizing these non-convex functions remains poorly understood. In this work we offer some theoretical guarantees for networks with piecewise affine activation functions, which have in recent years become the norm. We prove three main results. Firstly, that the network is piecewise convex as a function of the input data. Secondly, that the network, considered as a function of the parameters in a single layer, all others held constant, is again piecewise convex. Finally, that the network as a function of all its parameters is piecewise multi-convex, a generalization of biconvexity. From here we characterize the local minima and stationary points of the training objective, showing that they minimize certain subsets of the parameter space. We then analyze the performance of two optimization algorithms on multi-convex problems: gradient descent, and a method which repeatedly solves a number of convex sub-problems. We prove necessary convergence conditions for the first algorithm and both necessary and sufficient conditions for the second, after introducing regularization to the objective. Finally, we remark on the remaining difficulty of the global optimization problem. Under the squared error objective, we show that by varying the training data, a single rectifier neuron admits local minima arbitrarily far apart, both in objective value and parameter space.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2021

Convergence rates for gradient descent in the training of overparameterized artificial neural networks with biases

In recent years, artificial neural networks have developed into a powerf...
research
05/26/2016

No bad local minima: Data independent training error guarantees for multilayer neural networks

We use smoothed analysis techniques to provide guarantees on the trainin...
research
11/09/2016

Diverse Neural Network Learns True Target Functions

Neural networks are a powerful class of functions that can be trained wi...
research
09/19/2014

On the Impact of Multiobjective Scalarizing Functions

Recently, there has been a renewed interest in decomposition-based appro...
research
11/13/2015

On the Quality of the Initial Basin in Overspecified Neural Networks

Deep learning, in the form of artificial neural networks, has achieved r...
research
01/21/2023

Limitations of Piecewise Linearity for Efficient Robustness Certification

Certified defenses against small-norm adversarial examples have received...

Please sign up or login with your details

Forgot password? Click here to reset