A self consistent theory of Gaussian Processes captures feature learning effects in finite CNNs

06/08/2021
by   Gadi Naveh, et al.
5

Deep neural networks (DNNs) in the infinite width/channel limit have received much attention recently, as they provide a clear analytical window to deep learning via mappings to Gaussian Processes (GPs). Despite its theoretical appeal, this viewpoint lacks a crucial ingredient of deep learning in finite DNNs, laying at the heart of their success – feature learning. Here we consider DNNs trained with noisy gradient descent on a large training set and derive a self consistent Gaussian Process theory accounting for strong finite-DNN and feature learning effects. Applying this to a toy model of a two-layer linear convolutional neural network (CNN) shows good agreement with experiments. We further identify, both analytical and numerically, a sharp transition between a feature learning regime and a lazy learning regime in this model. Strong finite-DNN effects are also derived for a non-linear two-layer fully connected network. Our self consistent theory provides a rich and versatile analytical framework for studying feature learning and other non-lazy effects in finite DNNs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/21/2023

Local Kernel Renormalization as a mechanism for feature learning in overparametrized Convolutional Neural Networks

Feature learning, or the ability of deep neural networks to automaticall...
research
12/31/2021

Separation of scales and a thermodynamic description of feature learning in some CNNs

Deep neural networks (DNNs) are powerful tools for compressing and disti...
research
04/02/2020

Predicting the outputs of finite networks trained with noisy gradients

A recent line of studies has focused on the infinite width limit of deep...
research
06/12/2019

Learning Curves for Deep Neural Networks: A Gaussian Field Theory Perspective

A series of recent works suggest that deep neural networks (DNNs), of fi...
research
11/22/2022

Learning Deep Neural Networks by Iterative Linearisation

The excellent real-world performance of deep neural networks has receive...
research
02/14/2021

Double-descent curves in neural networks: a new perspective using Gaussian processes

Double-descent curves in neural networks describe the phenomenon that th...
research
06/26/2020

Is SGD a Bayesian sampler? Well, almost

Overparameterised deep neural networks (DNNs) are highly expressive and ...

Please sign up or login with your details

Forgot password? Click here to reset