DeepAI AI Chat
Log In Sign Up

A self consistent theory of Gaussian Processes captures feature learning effects in finite CNNs

by   Gadi Naveh, et al.

Deep neural networks (DNNs) in the infinite width/channel limit have received much attention recently, as they provide a clear analytical window to deep learning via mappings to Gaussian Processes (GPs). Despite its theoretical appeal, this viewpoint lacks a crucial ingredient of deep learning in finite DNNs, laying at the heart of their success – feature learning. Here we consider DNNs trained with noisy gradient descent on a large training set and derive a self consistent Gaussian Process theory accounting for strong finite-DNN and feature learning effects. Applying this to a toy model of a two-layer linear convolutional neural network (CNN) shows good agreement with experiments. We further identify, both analytical and numerically, a sharp transition between a feature learning regime and a lazy learning regime in this model. Strong finite-DNN effects are also derived for a non-linear two-layer fully connected network. Our self consistent theory provides a rich and versatile analytical framework for studying feature learning and other non-lazy effects in finite DNNs.


page 1

page 2

page 3

page 4


Local Kernel Renormalization as a mechanism for feature learning in overparametrized Convolutional Neural Networks

Feature learning, or the ability of deep neural networks to automaticall...

Separation of scales and a thermodynamic description of feature learning in some CNNs

Deep neural networks (DNNs) are powerful tools for compressing and disti...

Predicting the outputs of finite networks trained with noisy gradients

A recent line of studies has focused on the infinite width limit of deep...

Learning Curves for Deep Neural Networks: A Gaussian Field Theory Perspective

A series of recent works suggest that deep neural networks (DNNs), of fi...

Learning Deep Neural Networks by Iterative Linearisation

The excellent real-world performance of deep neural networks has receive...

Double-descent curves in neural networks: a new perspective using Gaussian processes

Double-descent curves in neural networks describe the phenomenon that th...

Is SGD a Bayesian sampler? Well, almost

Overparameterised deep neural networks (DNNs) are highly expressive and ...