DeepAI AI Chat
Log In Sign Up

Do Deeper Convolutional Networks Perform Better?

by   Eshaan Nichani, et al.

Over-parameterization is a recent topic of much interest in the machine learning community. While over-parameterized neural networks are capable of perfectly fitting (interpolating) training data, these networks often perform well on test data, thereby contradicting classical learning theory. Recent work provided an explanation for this phenomenon by introducing the double descent curve, showing that increasing model capacity past the interpolation threshold can lead to a decrease in test error. In line with this, it was recently shown empirically and theoretically that increasing neural network capacity through width leads to double descent. In this work, we analyze the effect of increasing depth on test performance. In contrast to what is observed for increasing width, we demonstrate through a variety of classification experiments on CIFAR10 and ImageNet32 using ResNets and fully-convolutional networks that test performance worsens beyond a critical depth. We posit an explanation for this phenomenon by drawing intuition from the principle of minimum norm solutions in linear networks.


Asymptotics of Ridge Regression in Convolutional Models

Understanding generalization and estimation error of estimators for simp...

Multi-scale Feature Learning Dynamics: Insights for Double Descent

A key challenge in building theoretical foundations for deep learning is...

Phenomenology of Double Descent in Finite-Width Neural Networks

`Double descent' delineates the generalization behaviour of models depen...

A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning

The rapid recent progress in machine learning (ML) has raised a number o...

Superior generalization of smaller models in the presence of significant label noise

The benefits of over-parameterization in achieving superior generalizati...

Benefit of Interpolation in Nearest Neighbor Algorithms

The over-parameterized models attract much attention in the era of data ...

Dodging the Sparse Double Descent

This paper presents an approach to addressing the issue of over-parametr...