Generalization and Stability of Interpolating Neural Networks with Minimal Width
We investigate the generalization and optimization of k-homogeneous shallow neural-network classifiers in the interpolating regime. The study focuses on analyzing the performance of the model when it is capable of perfectly classifying the input data with a positive margin γ. When using gradient descent with logistic-loss minimization, we show that the training loss converges to zero at a rate of Õ(1/γ^2/k T) given a polylogarithmic number of neurons. This suggests that gradient descent can find a perfect classifier for n input data within Ω̃(n) iterations. Additionally, through a stability analysis we show that with m=Ω(log^4/k (n)) neurons and T=Ω(n) iterations, the test loss is bounded by Õ(1/γ^2/k n). This is in contrast to existing stability results which require polynomial width and yield suboptimal generalization rates. Central to our analysis is the use of a new self-bounded weak convexity property, which leads to a generalized local quasi-convexity property for sufficiently parameterized neural-network classifiers. Eventually, despite the objective's non-convexity, this leads to convergence and generalization-gap bounds that are similar to those in the convex setting of linear logistic regression.
READ FULL TEXT