DeepAI AI Chat
Log In Sign Up

Over-parameterization Improves Generalization in the XOR Detection Problem

by   Alon Brutzkus, et al.

Empirical evidence suggests that neural networks with ReLU activations generalize better with over-parameterization. However, there is currently no theoretical analysis that explains this observation. In this work, we study a simplified learning task with over-parameterized convolutional networks that empirically exhibits the same qualitative phenomenon. For this setting, we provide a theoretical analysis of the optimization and generalization performance of gradient descent. Specifically, we prove data-dependent sample complexity bounds which show that over-parameterization improves the generalization performance of gradient descent.


How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks?

A recent line of research on deep learning focuses on the extremely over...

Towards Understanding Learning in Neural Networks with Linear Teachers

Can a neural network minimizing cross-entropy learn linearly separable d...

Benefits of Jointly Training Autoencoders: An Improved Neural Tangent Kernel Analysis

A remarkable recent discovery in machine learning has been that deep neu...

Demystifying the Global Convergence Puzzle of Learning Over-parameterized ReLU Nets in Very High Dimensions

This theoretical paper is devoted to developing a rigorous theory for de...

Understanding the Role of Adversarial Regularization in Supervised Learning

Despite numerous attempts sought to provide empirical evidence of advers...

On the Inductive Bias of a CNN for Orthogonal Patterns Distributions

Training overparameterized convolutional neural networks with gradient b...