
Gradient descent aligns the layers of deep linear networks
This paper establishes risk convergence and asymptotic weight matrix ali...
read it

Gradient Methods Never Overfit On Separable Data
A line of recent works established that when training linear predictors ...
read it

The Implicit Bias of Gradient Descent on Separable Data
We show that gradient descent on an unregularized logistic regression pr...
read it

An algorithmic view of ℓ_2 regularization and some pathfollowing algorithms
We establish an equivalence between the ℓ_2regularized solution path fo...
read it

Directional convergence and alignment in deep learning
In this paper, we show that although the minimizers of crossentropy and...
read it

Improved scalability under heavy tails, without strong convexity
Realworld data is laden with outlying values. The challenge for machine...
read it

Inductive Bias of Gradient Descent for Exponentially Weight Normalized Smooth Homogeneous Neural Nets
We analyze the inductive bias of gradient descent for weight normalized ...
read it
Gradient descent follows the regularization path for general losses
Recent work across many machine learning disciplines has highlighted that standard descent methods, even without explicit regularization, do not merely minimize the training error, but also exhibit an implicit bias. This bias is typically towards a certain regularized solution, and relies upon the details of the learning process, for instance the use of the crossentropy loss. In this work, we show that for empirical risk minimization over linear predictors with arbitrary convex, strictly decreasing losses, if the risk does not attain its infimum, then the gradientdescent path and the algorithmindependent regularization path converge to the same direction (whenever either converges to a direction). Using this result, we provide a justification for the widelyused exponentiallytailed losses (such as the exponential loss or the logistic loss): while this convergence to a direction for exponentiallytailed losses is necessarily to the maximummargin direction, other losses such as polynomiallytailed losses may induce convergence to a direction with a poor margin.
READ FULL TEXT
Comments
There are no comments yet.