A refined primal-dual analysis of the implicit bias

06/11/2019
by   Ziwei Ji, et al.
0

Recent work shows that gradient descent on linearly separable data is implicitly biased towards the maximum margin solution. However, no convergence rate which is tight in both n (the dataset size) and t (the training time) is given. This work proves that the normalized gradient descent iterates converge to the maximum margin solution at a rate of O(ln(n)/ ln(t)), which is tight in both n and t. The proof is via a dual convergence result: gradient descent induces a multiplicative weights update on the (normalized) SVM dual objective, whose convergence rate leads to the tight implicit bias rate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/05/2018

Convergence of Gradient Descent on Separable Data

The implicit bias of gradient descent is not fully understood even in si...
research
07/01/2021

Fast Margin Maximization via Dual Acceleration

We present and analyze a momentum-based gradient method for training lin...
research
06/20/2023

The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks

We study the implicit bias of batch normalization trained by gradient de...
research
10/17/2022

On Accelerated Perceptrons and Beyond

The classical Perceptron algorithm of Rosenblatt can be used to find a l...
research
11/12/2020

Implicit bias of any algorithm: bounding bias via margin

Consider n points x_1,…,x_n in finite-dimensional euclidean space, each ...
research
04/23/2013

The Stochastic Gradient Descent for the Primal L1-SVM Optimization Revisited

We reconsider the stochastic (sub)gradient approach to the unconstrained...
research
01/09/2019

The Lingering of Gradients: How to Reuse Gradients over Time

Classically, the time complexity of a first-order method is estimated by...

Please sign up or login with your details

Forgot password? Click here to reset