Deep linear neural networks with arbitrary loss: All local minima are global
We consider deep linear networks with arbitrary differentiable loss. We provide a short and elementary proof of the following fact: all local minima are global minima if each hidden layer is wider than either the input or output layer.
READ FULL TEXT