Generalization Guarantees of Gradient Descent for Multi-Layer Neural Networks

05/26/2023
by   Puyu Wang, et al.
0

Recently, significant progress has been made in understanding the generalization of neural networks (NNs) trained by gradient descent (GD) using the algorithmic stability approach. However, most of the existing research has focused on one-hidden-layer NNs and has not addressed the impact of different network scaling parameters. In this paper, we greatly extend the previous work <cit.> by conducting a comprehensive stability and generalization analysis of GD for multi-layer NNs. For two-layer NNs, our results are established under general network scaling parameters, relaxing previous conditions. In the case of three-layer NNs, our technical contribution lies in demonstrating its nearly co-coercive property by utilizing a novel induction strategy that thoroughly explores the effects of over-parameterization. As a direct application of our general findings, we derive the excess risk rate of O(1/√(n)) for GD algorithms in both two-layer and three-layer NNs. This sheds light on sufficient or necessary conditions for under-parameterized and over-parameterized NNs trained by GD to attain the desired risk rate of O(1/√(n)). Moreover, we demonstrate that as the scaling parameter increases or the network complexity decreases, less over-parameterization is required for GD to achieve the desired error rates. Additionally, under a low-noise condition, we obtain a fast risk rate of O(1/n) for GD in both two-layer and three-layer NNs.

READ FULL TEXT
research
11/27/2019

How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks?

A recent line of research on deep learning focuses on the extremely over...
research
06/11/2019

An Improved Analysis of Training Over-parameterized Deep Neural Networks

A recent line of research has shown that gradient-based algorithms with ...
research
05/20/2018

A Vest of the Pseudoinverse Learning Algorithm

In this letter, we briefly review the basic scheme of the pseudoinverse ...
research
01/13/2021

Learning with Gradient Descent and Weakly Convex Losses

We study the learning performance of gradient descent when the empirical...
research
05/26/2019

On Learning Over-parameterized Neural Networks: A Functional Approximation Prospective

We consider training over-parameterized two-layer neural networks with R...
research
02/18/2021

On Connectivity of Solutions in Deep Learning: The Role of Over-parameterization and Feature Quality

It has been empirically observed that, in deep neural networks, the solu...
research
05/17/2023

On the ISS Property of the Gradient Flow for Single Hidden-Layer Neural Networks with Linear Activations

Recent research in neural networks and machine learning suggests that us...

Please sign up or login with your details

Forgot password? Click here to reset