Why Are Convolutional Nets More Sample-Efficient than Fully-Connected Nets?

10/16/2020
by   Zhiyuan Li, et al.
25

Convolutional neural networks often dominate fully-connected counterparts in generalization performance, especially on image classification tasks. This is often explained in terms of 'better inductive bias'. However, this has not been made mathematically rigorous, and the hurdle is that the fully connected net can always simulate the convolutional net (for a fixed task). Thus the training algorithm plays a role. The current work describes a natural task on which a provable sample complexity gap can be shown, for standard training algorithms. We construct a single natural distribution on ℝ^d×{± 1} on which any orthogonal-invariant algorithm (i.e. fully-connected networks trained with most gradient-based methods from gaussian initialization) requires Ω(d^2) samples to generalize while O(1) samples suffice for convolutional architectures. Furthermore, we demonstrate a single target function, learning which on all possible distributions leads to an O(1) vs Ω(d^2/ε) gap. The proof relies on the fact that SGD on fully-connected network is orthogonal equivariant. Similar results are achieved for ℓ_2 regression and adaptive training algorithms, e.g. Adam and AdaGrad, which are only permutation equivariant.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2020

Towards Learning Convolutions from Scratch

Convolution is one of the most essential components of architectures use...
research
02/13/2019

Identity Crisis: Memorization and Generalization under Extreme Overparameterization

We study the interplay between memorization and generalization of overpa...
research
02/14/2023

The Missing Margin: How Sample Corruption Affects Distance to the Boundary in ANNs

Classification margins are commonly used to estimate the generalization ...
research
11/09/2015

How far can we go without convolution: Improving fully-connected networks

We propose ways to improve the performance of fully connected networks. ...
research
06/09/2022

Redundancy in Deep Linear Neural Networks

Conventional wisdom states that deep linear neural networks benefit from...
research
06/08/2016

Convolution by Evolution: Differentiable Pattern Producing Networks

In this work we introduce a differentiable version of the Compositional ...
research
10/06/2022

Probabilistic partition of unity networks for high-dimensional regression problems

We explore the probabilistic partition of unity network (PPOU-Net) model...

Please sign up or login with your details

Forgot password? Click here to reset