Contrasting random and learned features in deep Bayesian linear regression

03/01/2022
by   Jacob A. Zavatone-Veth, et al.
0

Understanding how feature learning affects generalization is among the foremost goals of modern deep learning theory. Here, we study how the ability to learn representations affects the generalization performance of a simple class of models: deep Bayesian linear neural networks trained on unstructured Gaussian data. By comparing deep random feature models to deep networks in which all layers are trained, we provide a detailed characterization of the interplay between width, depth, data density, and prior mismatch. We show that both models display sample-wise double-descent behavior in the presence of label noise. Random feature models can also display model-wise double-descent if there are narrow bottleneck layers, while deep networks do not show these divergences. Random feature models can have particular widths that are optimal for generalization at a given data density, while making neural networks as wide or as narrow as possible is always optimal. Moreover, we show that the leading-order correction to the kernel-limit learning curve cannot distinguish between random feature models and deep networks in which all layers are trained. Taken together, our findings begin to elucidate how architectural details affect generalization performance in this simple class of deep regression models.

READ FULL TEXT

page 5

page 6

page 7

page 9

page 10

page 12

research
08/15/2020

The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization

Modern deep learning models employ considerably more parameters than req...
research
03/01/2023

Learning curves for deep structured Gaussian feature models

In recent years, significant attention in deep learning theory has been ...
research
06/24/2022

Learning sparse features can lead to overfitting in neural networks

It is widely believed that the success of deep networks lies in their ab...
research
11/05/2018

Generalization Bounds for Neural Networks: Kernels, Symmetry, and Sample Compression

Though Deep Neural Networks (DNNs) are widely celebrated for their pract...
research
07/20/2020

Early Stopping in Deep Networks: Double Descent and How to Eliminate it

Over-parameterized models, in particular deep networks, often exhibit a ...
research
10/22/2021

Model, sample, and epoch-wise descents: exact solution of gradient flow in the random feature model

Recent evidence has shown the existence of a so-called double-descent an...
research
03/14/2022

Phenomenology of Double Descent in Finite-Width Neural Networks

`Double descent' delineates the generalization behaviour of models depen...

Please sign up or login with your details

Forgot password? Click here to reset