Generalization and Stability of Interpolating Neural Networks with Minimal Width

02/18/2023
by   Hossein Taheri, et al.
0

We investigate the generalization and optimization of k-homogeneous shallow neural-network classifiers in the interpolating regime. The study focuses on analyzing the performance of the model when it is capable of perfectly classifying the input data with a positive margin γ. When using gradient descent with logistic-loss minimization, we show that the training loss converges to zero at a rate of Õ(1/γ^2/k T) given a polylogarithmic number of neurons. This suggests that gradient descent can find a perfect classifier for n input data within Ω̃(n) iterations. Additionally, through a stability analysis we show that with m=Ω(log^4/k (n)) neurons and T=Ω(n) iterations, the test loss is bounded by Õ(1/γ^2/k n). This is in contrast to existing stability results which require polynomial width and yield suboptimal generalization rates. Central to our analysis is the use of a new self-bounded weak convexity property, which leads to a generalized local quasi-convexity property for sufficiently parameterized neural-network classifiers. Eventually, despite the objective's non-convexity, this leads to convergence and generalization-gap bounds that are similar to those in the convex setting of linear logistic regression.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2023

Gradient Descent Converges Linearly for Logistic Regression on Separable Data

We show that running gradient descent with variable learning rate guaran...
research
12/04/2020

When does gradient descent with logistic loss find interpolating two-layer networks?

We study the training of finite-width two-layer smoothed ReLU networks f...
research
05/22/2023

Fast Convergence in Learning Two-Layer Neural Networks with Separable Data

Normalized gradient descent has shown substantial success in speeding up...
research
05/08/2019

Optimal Statistical Rates for Decentralised Non-Parametric Regression with Linear Speed-Up

We analyse the learning performance of Distributed Gradient Descent in t...
research
03/22/2018

Gradient Descent Quantizes ReLU Network Features

Deep neural networks are often trained in the over-parametrized regime (...
research
12/14/2022

Learning threshold neurons via the "edge of stability"

Existing analyses of neural network training often operate under the unr...
research
01/13/2021

Learning with Gradient Descent and Weakly Convex Losses

We study the learning performance of gradient descent when the empirical...

Please sign up or login with your details

Forgot password? Click here to reset