Refined Generalization Analysis of Gradient Descent for Over-parameterized Two-layer Neural Networks with Smooth Activations on Classification Problems

05/23/2019
by   Atsushi Nitanda, et al.
2

Recently, several studies have proven the global convergence and generalization abilities of the gradient descent method for two-layer ReLU networks by making a positivity assumption of the Gram-matrix of the neural tangent kernel. However, the performance of gradient descent on classification problems has not been well studied, and further investigation of the problem structure is possible. In this work, we present a partially stronger but reasonable assumption for binary classification problems compared to the positivity assumption of the Gram-matrix, where a data distribution can be perfectly classifiable by a tangent model, and we provide a refined generalization analysis of the gradient descent method for two-layer networks with smooth activations. A remarkable point of this study is that our generalization bound has much better dependence on the network width compared to existing results. As a result, our theory significantly enlarges a class of over-parameterized networks having provable generalization ability, with respect to network width, while most studies require much higher over-parameterization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2019

On the convergence of gradient descent for two layer neural networks

It has been shown that gradient descent can yield the zero training loss...
research
10/06/2018

Over-parameterization Improves Generalization in the XOR Detection Problem

Empirical evidence suggests that neural networks with ReLU activations g...
research
12/04/2020

When does gradient descent with logistic loss find interpolating two-layer networks?

We study the training of finite-width two-layer smoothed ReLU networks f...
research
03/16/2022

A Multi-parameter Updating Fourier Online Gradient Descent Algorithm for Large-scale Nonlinear Classification

Large scale nonlinear classification is a challenging task in the field ...
research
01/01/2023

Sharper analysis of sparsely activated wide neural networks with trainable biases

This work studies training one-hidden-layer overparameterized ReLU netwo...
research
04/24/2021

Achieving Small Test Error in Mildly Overparameterized Neural Networks

Recent theoretical works on over-parameterized neural nets have focused ...
research
07/29/2021

Deep Networks Provably Classify Data on Curves

Data with low-dimensional nonlinear structure are ubiquitous in engineer...

Please sign up or login with your details

Forgot password? Click here to reset