Do Deeper Convolutional Networks Perform Better?

by   Eshaan Nichani, et al.

Over-parameterization is a recent topic of much interest in the machine learning community. While over-parameterized neural networks are capable of perfectly fitting (interpolating) training data, these networks often perform well on test data, thereby contradicting classical learning theory. Recent work provided an explanation for this phenomenon by introducing the double descent curve, showing that increasing model capacity past the interpolation threshold can lead to a decrease in test error. In line with this, it was recently shown empirically and theoretically that increasing neural network capacity through width leads to double descent. In this work, we analyze the effect of increasing depth on test performance. In contrast to what is observed for increasing width, we demonstrate through a variety of classification experiments on CIFAR10 and ImageNet32 using ResNets and fully-convolutional networks that test performance worsens beyond a critical depth. We posit an explanation for this phenomenon by drawing intuition from the principle of minimum norm solutions in linear networks.



There are no comments yet.


page 7


Asymptotics of Ridge Regression in Convolutional Models

Understanding generalization and estimation error of estimators for simp...

The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization

Modern deep learning models employ considerably more parameters than req...

Multi-scale Feature Learning Dynamics: Insights for Double Descent

A key challenge in building theoretical foundations for deep learning is...

A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning

The rapid recent progress in machine learning (ML) has raised a number o...

What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation

Deep learning algorithms are well-known to have a propensity for fitting...

Is deeper better? It depends on locality of relevant features

It has been recognized that a heavily overparameterized artificial neura...

Benefit of Interpolation in Nearest Neighbor Algorithms

The over-parameterized models attract much attention in the era of data ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.