Local Kernel Renormalization as a mechanism for feature learning in overparametrized Convolutional Neural Networks

07/21/2023
by   R. Aiudi, et al.
0

Feature learning, or the ability of deep neural networks to automatically learn relevant features from raw data, underlies their exceptional capability to solve complex tasks. However, feature learning seems to be realized in different ways in fully-connected (FC) or convolutional architectures (CNNs). Empirical evidence shows that FC neural networks in the infinite-width limit eventually outperform their finite-width counterparts. Since the kernel that describes infinite-width networks does not evolve during training, whatever form of feature learning occurs in deep FC architectures is not very helpful in improving generalization. On the other hand, state-of-the-art architectures with convolutional layers achieve optimal performances in the finite-width regime, suggesting that an effective form of feature learning emerges in this case. In this work, we present a simple theoretical framework that provides a rationale for these differences, in one hidden layer networks. First, we show that the generalization performance of a finite-width FC network can be obtained by an infinite-width network, with a suitable choice of the Gaussian priors. Second, we derive a finite-width effective action for an architecture with one convolutional hidden layer and compare it with the result available for FC networks. Remarkably, we identify a completely different form of kernel renormalization: whereas the kernel of the FC architecture is just globally renormalized by a single scalar parameter, the CNN kernel undergoes a local renormalization, meaning that the network can select the local components that will contribute to the final prediction in a data-dependent way. This finding highlights a simple mechanism for feature learning that can take place in overparametrized shallow CNNs, but not in shallow FC architectures or in locally connected neural networks without weight sharing.

READ FULL TEXT

page 2

page 7

research
11/30/2020

Feature Learning in Infinite-Width Neural Networks

As its width tends to infinity, a deep neural network's behavior under g...
research
06/08/2021

A self consistent theory of Gaussian Processes captures feature learning effects in finite CNNs

Deep neural networks (DNNs) in the infinite width/channel limit have rec...
research
06/24/2020

On the Empirical Neural Tangent Kernel of Standard Finite-Width Convolutional Neural Network Architectures

The Neural Tangent Kernel (NTK) is an important milestone in the ongoing...
research
06/10/2021

Separation Results between Fixed-Kernel and Feature-Learning Probability Metrics

Several works in implicit and explicit generative modeling empirically o...
research
06/06/2022

Spectral Bias Outside the Training Set for Deep Networks in the Kernel Regime

We provide quantitative bounds measuring the L^2 difference in function ...
research
04/26/2013

An Algorithm for Training Polynomial Networks

We consider deep neural networks, in which the output of each node is a ...
research
07/31/2020

Finite Versus Infinite Neural Networks: an Empirical Study

We perform a careful, thorough, and large scale empirical study of the c...

Please sign up or login with your details

Forgot password? Click here to reset