Plateau in Monotonic Linear Interpolation – A "Biased" View of Loss Landscape for Deep Networks

10/03/2022
by   Xiang Wang, et al.
6

Monotonic linear interpolation (MLI) - on the line connecting a random initialization with the minimizer it converges to, the loss and accuracy are monotonic - is a phenomenon that is commonly observed in the training of neural networks. Such a phenomenon may seem to suggest that optimization of neural networks is easy. In this paper, we show that the MLI property is not necessarily related to the hardness of optimization problems, and empirical observations on MLI for deep neural networks depend heavily on biases. In particular, we show that interpolating both weights and biases linearly leads to very different influences on the final output, and when different classes have different last-layer biases on a deep network, there will be a long plateau in both the loss and accuracy interpolation (which existing theory of MLI cannot explain). We also show how the last-layer biases for different classes can be different even on a perfectly balanced dataset using a simple model. Empirically we demonstrate that similar intuitions hold on practical networks and realistic datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/22/2021

Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes

Linear interpolation between initial neural network parameters and conve...
research
06/30/2021

What can linear interpolation of neural network loss landscapes tell us?

Studying neural network loss landscapes provides insights into the natur...
research
03/14/2021

Pre-interpolation loss behaviour in neural networks

When training neural networks as classifiers, it is common to observe an...
research
01/29/2021

Layer-Peeled Model: Toward Understanding Well-Trained Deep Neural Networks

In this paper, we introduce the Layer-Peeled Model, a nonconvex yet anal...
research
01/18/2023

Strong inductive biases provably prevent harmless interpolation

Classical wisdom suggests that estimators should avoid fitting noise to ...
research
05/26/2021

A Universal Law of Robustness via Isoperimetry

Classically, data interpolation with a parametrized model class is possi...
research
06/13/2022

Rank Diminishing in Deep Neural Networks

The rank of neural networks measures information flowing across layers. ...

Please sign up or login with your details

Forgot password? Click here to reset