The Multilinear Structure of ReLU Networks

12/29/2017
by   Thomas Laurent, et al.
0

We study the loss surface of neural networks equipped with a hinge loss criterion and ReLU or leaky ReLU nonlinearities. Any such network defines a piecewise multilinear form in parameter space, and as a consequence, optima of such networks generically occur in non-differentiable regions of parameter space. Any understanding of such networks must therefore carefully take into account their non-smooth nature. We show how to use techniques from nonsmooth analysis to study these non-differentiable loss surfaces. Our analysis focuses on three different scenarios: (1) a deep linear network with hinge loss and arbitrary data, (2) a one-hidden layer network with leaky ReLUs and linearly separable data, and (3) a one-hidden layer network with ReLU nonlinearities and linearly separable data. We show that all local minima are global minima in the first two scenarios. A bifurcation occurs when passing from the second to the the third scenario, in that ReLU networks do have non-optimal local minima. We provide a complete description of such sub-optimal solutions. We conclude by investigating the extent to which these phenomena do, or do not, persist when passing to the multiclass context.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset