On the Inductive Bias of a CNN for Orthogonal Patterns Distributions

02/22/2020
by   Alon Brutzkus, et al.
0

Training overparameterized convolutional neural networks with gradient based methods is the most successful learning method for image classification. However, its theoretical properties are far from understood even for very simple learning tasks. In this work, we consider a simplified image classification task where images contain orthogonal patches and are learned with a 3-layer overparameterized convolutional network and stochastic gradient descent. We empirically identify a novel phenomenon where the dot-product between the learned pattern detectors and their detected patterns are governed by the pattern statistics in the training set. We call this phenomenon Pattern Statistics Inductive Bias (PSI) and prove that PSI holds for a simple setup with two points in the training set. Furthermore, we prove that if PSI holds, stochastic gradient descent has sample complexity O(d^2log(d)) where d is the filter dimension. In contrast, we show a VC dimension lower bound in our setting which is exponential in d. Taken together, our results provide strong evidence that PSI is a unique inductive bias of stochastic gradient descent, that guarantees good generalization properties.

READ FULL TEXT
research
10/21/2019

Non-Gaussianity of Stochastic Gradient Noise

What enables Stochastic Gradient Descent (SGD) to achieve better general...
research
08/03/2018

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

Neural networks have many successful applications, while much less theor...
research
02/13/2018

Towards Understanding the Generalization Bias of Two Layer Convolutional Linear Classifiers with Gradient Descent

A major challenge in understanding the generalization of deep learning i...
research
10/06/2018

Over-parameterization Improves Generalization in the XOR Detection Problem

Empirical evidence suggests that neural networks with ReLU activations g...
research
11/27/2020

Deep orthogonal linear networks are shallow

We consider the problem of training a deep orthogonal linear network, wh...
research
07/07/2020

Gradient Descent Converges to Ridgelet Spectrum

Deep learning achieves a high generalization performance in practice, de...
research
07/26/2023

Controlling the Inductive Bias of Wide Neural Networks by Modifying the Kernel's Spectrum

Wide neural networks are biased towards learning certain functions, infl...

Please sign up or login with your details

Forgot password? Click here to reset