Fast Convergence in Learning Two-Layer Neural Networks with Separable Data

05/22/2023
by   Hossein Taheri, et al.
0

Normalized gradient descent has shown substantial success in speeding up the convergence of exponentially-tailed loss functions (which includes exponential and logistic losses) on linear classifiers with separable data. In this paper, we go beyond linear models by studying normalized GD on two-layer neural nets. We prove for exponentially-tailed losses that using normalized GD leads to linear rate of convergence of the training loss to the global optimum. This is made possible by showing certain gradient self-boundedness conditions and a log-Lipschitzness property. We also study generalization of normalized GD for convex objectives via an algorithmic-stability analysis. In particular, we show that normalized GD does not overfit during training by establishing finite-time generalization bounds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2022

Decentralized Learning with Separable Data: Generalization and Fast Algorithms

Decentralized learning offers privacy and communication efficiency when ...
research
07/01/2021

Fast Margin Maximization via Dual Acceleration

We present and analyze a momentum-based gradient method for training lin...
research
02/27/2022

Stability vs Implicit Bias of Gradient Methods on Separable Data and Beyond

An influential line of recent work has focused on the generalization pro...
research
06/30/2020

Gradient Methods Never Overfit On Separable Data

A line of recent works established that when training linear predictors ...
research
02/18/2023

Generalization and Stability of Interpolating Neural Networks with Minimal Width

We investigate the generalization and optimization of k-homogeneous shal...
research
06/12/2015

On the accuracy of self-normalized log-linear models

Calculation of the log-normalizer is a major computational obstacle in a...
research
03/13/2023

General Loss Functions Lead to (Approximate) Interpolation in High Dimensions

We provide a unified framework, applicable to a general family of convex...

Please sign up or login with your details

Forgot password? Click here to reset