Eliminating all bad Local Minima from Loss Landscapes without even adding an Extra Unit

01/12/2019
by   Jascha Sohl-Dickstein, et al.
Google
MIT
0

Recent work has noted that all bad local minima can be removed from neural network loss landscapes, by adding a single unit with a particular parameterization. We show that the core technique from these papers can be used to remove all bad local minima from any loss landscape, so long as the global minimum has a loss of zero. This procedure does not require the addition of auxiliary units, or even that the loss be associated with a neural network. The method of action involves all bad local minima being converted into bad (non-local) minima at infinity in terms of auxiliary parameters.

READ FULL TEXT VIEW PDF
01/30/2019

Blurred Images Lead to Bad Local Minima

Blurred Images Lead to Bad Local Minima...
05/22/2018

Adding One Neuron Can Eliminate All Bad Local Minima

One of the main difficulties in analyzing neural networks is the non-con...
01/02/2019

Elimination of All Bad Local Minima in Deep Learning

In this paper, we theoretically prove that we can eliminate all suboptim...
11/10/2020

Towards a Better Global Loss Landscape of GANs

Understanding of GAN training is still very limited. One major challenge...
01/04/2007

Statistical tools to assess the reliability of self-organizing maps

Results of neural network learning are always subject to some variabilit...
12/31/2019

Revisiting Landscape Analysis in Deep Neural Networks: Eliminating Decreasing Paths to Infinity

Traditional landscape analysis of deep neural networks aims to show that...
09/27/2018

On the loss landscape of a class of deep neural networks with no bad local valleys

We identify a class of over-parameterized deep neural networks with stan...

I Eliminating all bad local minima

Take a loss function

, with parameters , and with a global minimum . Consider the modified loss function

(1)

where are auxiliary parameters, and

is a regularization hyperparameter. The specific form of Equation

1 was chosen to emphasize the similarity to the approach in Liang et al. [2] and Kawaguchi and Kaelbling [1], but without involving auxiliary units.

As can be seen by inspection, the gradient with respect to the auxiliary parameters is only zero for finite when and . Otherwise, will tend to shrink towards zero to satisfy the regularizer, will tend to grow towards infinity so that can remain approximately 1, and no fixed point will be achieved for finite . Thus, all non-global local minima of are transformed into minima at of . Recall that minima at infinity do not qualify as local minima in . Therefore, any local minimum of is a global minimum of , and has no bad local minima.

See Appendix A for a more formal derivation, and Figure 1 for a visualization.

Ii Is this significant?

By eliminating the auxiliary neurons which play a central role in

Kawaguchi and Kaelbling [1] and Liang et al. [2] we hope to provide more clarity into the mechanism by which bad local minima are removed from the augmented loss. We leave it to the reader to judge whether removing local minima in this fashion is trivial, deep, or both.

We also note that there is extensive discussion in Section 5 of Kawaguchi and Kaelbling [1] of situations in which their auxiliary variable (which plays a qualitatively similar role to in Section I above) diverges to infinity. So, it has been previously observed that pathologies can continue to exist in loss landscapes modified in a fashion similar to above.

Fig. 1: All local minima of with become non-local minima at infinity of . Contour plots of the modified loss landscape in terms of auxiliary parameters and , for . When the original loss function , then approaches a minimum in terms of and as and . When is at its global minimum, , then has a local minimum at , for any value of .

Acknowledgments

We thank Leslie Kaelbling, Andrey Zhmoginov, and Hossein Mobahi for feedback on a draft of the manuscript.

References

Appendix A All critical points of are global minima of

At critical points of , and , which together imply that . Substituting in , we must have at any critical point of with respect to . This can only be satisfied by . Therefore at every critical point of (including every local minimum), , and thus is a global minimum of .