Random Feature Amplification: Feature Learning and Generalization in Neural Networks

02/15/2022
by   Spencer Frei, et al.
0

In this work, we provide a characterization of the feature-learning process in two-layer ReLU networks trained by gradient descent on the logistic loss following random initialization. We consider data with binary labels that are generated by an XOR-like function of the input features. We permit a constant fraction of the training labels to be corrupted by an adversary. We show that, although linear classifiers are no better than random guessing for the distribution we consider, two-layer ReLU networks trained by gradient descent achieve generalization error close to the label noise rate, refuting the conjecture of Malach and Shalev-Shwartz that 'deeper is better only when shallow is good'. We develop a novel proof technique that shows that at initialization, the vast majority of neurons function as random features that are only weakly correlated with useful features, and the gradient descent dynamics 'amplify' these weak, random features to strong, useful features.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2022

Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data

Benign overfitting, the phenomenon where interpolating models generalize...
research
01/01/2023

Theoretical Characterization of How Neural Network Pruning Affects its Generalization

It has been observed in practice that applying pruning-at-initialization...
research
06/03/2022

A Theoretical Analysis on Feature Learning in Neural Networks: Emergence from Inputs and Advantage over Fixed Features

An important characteristic of neural networks is their ability to learn...
research
01/28/2020

MSE-Optimal Neural Network Initialization via Layer Fusion

Deep neural networks achieve state-of-the-art performance for a range of...
research
07/24/2023

Early Neuron Alignment in Two-layer ReLU Networks with Small Initialization

This paper studies the problem of training a two-layer ReLU network for ...
research
12/05/2022

Improved Convergence Guarantees for Shallow Neural Networks

We continue a long line of research aimed at proving convergence of dept...
research
06/24/2023

Graph Neural Networks Provably Benefit from Structural Information: A Feature Learning Perspective

Graph neural networks (GNNs) have pioneered advancements in graph repres...

Please sign up or login with your details

Forgot password? Click here to reset