Learning to Reason with Neural Networks: Generalization, Unseen Data and Boolean Measures

05/26/2022
by   Emmanuel Abbe, et al.
90

This paper considers the Pointer Value Retrieval (PVR) benchmark introduced in [ZRKB21], where a 'reasoning' function acts on a string of digits to produce the label. More generally, the paper considers the learning of logical functions with gradient descent (GD) on neural networks. It is first shown that in order to learn logical functions with gradient descent on symmetric neural networks, the generalization error can be lower-bounded in terms of the noise-stability of the target function, supporting a conjecture made in [ZRKB21]. It is then shown that in the distribution shift setting, when the data withholding corresponds to freezing a single feature (referred to as canonical holdout), the generalization error of gradient descent admits a tight characterization in terms of the Boolean influence for several relevant architectures. This is shown on linear models and supported experimentally on other models such as MLPs and Transformers. In particular, this puts forward the hypothesis that for such architectures and for learning logical functions such as PVR functions, GD tends to have an implicit bias towards low-degree representations, which in turn gives the Boolean influence for the generalization error under quadratic loss.

READ FULL TEXT

Authors

page 1

page 2

page 3

page 4

02/15/2022

Random Feature Amplification: Feature Learning and Generalization in Neural Networks

In this work, we provide a characterization of the feature-learning proc...
04/10/2021

SGD Implicitly Regularizes Generalization Error

We derive a simple and model-independent formula for the change in the g...
02/27/2022

Stability vs Implicit Bias of Gradient Methods on Separable Data and Beyond

An influential line of recent work has focused on the generalization pro...
01/07/2021

A Comprehensive Study on Optimization Strategies for Gradient Descent In Deep Learning

One of the most important parts of Artificial Neural Networks is minimiz...
07/07/2020

Gradient Descent Converges to Ridgelet Spectrum

Deep learning achieves a high generalization performance in practice, de...
10/25/2019

Learning Boolean Circuits with Neural Networks

Training neural-networks is computationally hard. However, in practice t...
08/16/2020

A Functional Perspective on Learning Symmetric Functions with Neural Networks

Symmetric functions, which take as input an unordered, fixed-size set, a...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.