Gradient Descent Maximizes the Margin of Homogeneous Neural Networks

06/13/2019
by   Kaifeng Lyu, et al.
0

Recent works on implicit regularization have shown that gradient descent converges to the max-margin direction for logistic regression with one-layer or multi-layer linear networks. In this paper, we generalize this result to homogeneous neural networks, including fully-connected and convolutional neural networks with ReLU or LeakyReLU activations. In particular, we study the gradient flow (gradient descent with infinitesimal step size) optimizing the logistic loss or cross-entropy loss of any homogeneous model (possibly non-smooth), and show that if the training loss decreases below a certain threshold, then we can define a smoothed version of the normalized margin which increases over time. We also formulate a natural constrained optimization problem related to margin maximization, and prove that both the normalized margin and its smoothed version converge to the objective value at a KKT point of the optimization problem. Furthermore, we extend the above results to a large family of loss functions. We conduct several experiments to justify our theoretical finding on MNIST and CIFAR-10 datasets. For gradient descent with constant learning rate, we observe that the normalized margin indeed keeps increasing after the dataset is fitted, but the speed is very slow. However, if we schedule the learning rate more carefully, we can observe a more rapid growth of the normalized margin. Finally, as margin is closely related to robustness, we discuss potential benefits of training longer for improving the robustness of the model.

READ FULL TEXT
research
10/27/2017

The Implicit Bias of Gradient Descent on Separable Data

We show that gradient descent on an unregularized logistic regression pr...
research
02/11/2020

Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss

Neural networks trained to minimize the logistic (a.k.a. cross-entropy) ...
research
10/06/2021

On Margin Maximization in Linear and ReLU Networks

The implicit bias of neural networks has been extensively studied in rec...
research
10/24/2020

Inductive Bias of Gradient Descent for Exponentially Weight Normalized Smooth Homogeneous Neural Nets

We analyze the inductive bias of gradient descent for weight normalized ...
research
10/12/2018

On the Margin Theory of Feedforward Neural Networks

Past works have shown that, somewhat surprisingly, over-parametrization ...
research
06/11/2020

Directional convergence and alignment in deep learning

In this paper, we show that although the minimizers of cross-entropy and...
research
05/17/2019

Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models

With an eye toward understanding complexity control in deep learning, we...

Please sign up or login with your details

Forgot password? Click here to reset