Generalization and Stability of Interpolating Neural Networks with Minimal Width

02/18/2023
by   Hossein Taheri, et al.
0

We investigate the generalization and optimization of k-homogeneous shallow neural-network classifiers in the interpolating regime. The study focuses on analyzing the performance of the model when it is capable of perfectly classifying the input data with a positive margin γ. When using gradient descent with logistic-loss minimization, we show that the training loss converges to zero at a rate of Õ(1/γ^2/k T) given a polylogarithmic number of neurons. This suggests that gradient descent can find a perfect classifier for n input data within Ω̃(n) iterations. Additionally, through a stability analysis we show that with m=Ω(log^4/k (n)) neurons and T=Ω(n) iterations, the test loss is bounded by Õ(1/γ^2/k n). This is in contrast to existing stability results which require polynomial width and yield suboptimal generalization rates. Central to our analysis is the use of a new self-bounded weak convexity property, which leads to a generalized local quasi-convexity property for sufficiently parameterized neural-network classifiers. Eventually, despite the objective's non-convexity, this leads to convergence and generalization-gap bounds that are similar to those in the convex setting of linear logistic regression.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset