Towards Understanding the Generalization Bias of Two Layer Convolutional Linear Classifiers with Gradient Descent

02/13/2018
by   Yifan Wu, et al.
0

A major challenge in understanding the generalization of deep learning is to explain why (stochastic) gradient descent can exploit the network architecture to find solutions that have good generalization performance when using high capacity models. We find simple but realistic examples showing that this phenomenon exists even when learning linear classifiers --- between two linear networks with the same capacity, the one with a convolutional layer can generalize better than the other when the data distribution has some underlying spatial structure. We argue that this difference results from a combination of the convolution architecture, data distribution and gradient descent, all of which are necessary to be included in a meaningful analysis. We provide a general analysis of the generalization performance as a function of data distribution and convolutional filter size, given gradient descent as the optimization algorithm, then interpret the results using concrete examples. Experimental results show that our analysis is able to explain what happens in our introduced examples.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/22/2018

Gradient Descent Quantizes ReLU Network Features

Deep neural networks are often trained in the over-parametrized regime (...
research
09/02/2022

Future Gradient Descent for Adapting the Temporal Shifting Data Distribution in Online Recommendation Systems

One of the key challenges of learning an online recommendation model is ...
research
02/25/2020

Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization

An open question in the Deep Learning community is why neural networks t...
research
02/22/2020

On the Inductive Bias of a CNN for Orthogonal Patterns Distributions

Training overparameterized convolutional neural networks with gradient b...
research
11/17/2022

Why Deep Learning Generalizes

Very large deep learning models trained using gradient descent are remar...
research
02/15/2021

How to Learn when Data Reacts to Your Model: Performative Gradient Descent

Performative distribution shift captures the setting where the choice of...
research
10/01/2020

Understanding the Role of Adversarial Regularization in Supervised Learning

Despite numerous attempts sought to provide empirical evidence of advers...

Please sign up or login with your details

Forgot password? Click here to reset