Learning to Reason with Neural Networks: Generalization, Unseen Data and Boolean Measures

05/26/2022
by   Emmanuel Abbe, et al.
90

This paper considers the Pointer Value Retrieval (PVR) benchmark introduced in [ZRKB21], where a 'reasoning' function acts on a string of digits to produce the label. More generally, the paper considers the learning of logical functions with gradient descent (GD) on neural networks. It is first shown that in order to learn logical functions with gradient descent on symmetric neural networks, the generalization error can be lower-bounded in terms of the noise-stability of the target function, supporting a conjecture made in [ZRKB21]. It is then shown that in the distribution shift setting, when the data withholding corresponds to freezing a single feature (referred to as canonical holdout), the generalization error of gradient descent admits a tight characterization in terms of the Boolean influence for several relevant architectures. This is shown on linear models and supported experimentally on other models such as MLPs and Transformers. In particular, this puts forward the hypothesis that for such architectures and for learning logical functions such as PVR functions, GD tends to have an implicit bias towards low-degree representations, which in turn gives the Boolean influence for the generalization error under quadratic loss.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/30/2023

Generalization on the Unseen, Logic Reasoning and Degree Curriculum

This paper considers the learning of logical (Boolean) functions with fo...
research
07/08/2022

Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent

As part of the effort to understand implicit bias of gradient descent in...
research
02/09/2023

Efficient displacement convex optimization with particle gradient descent

Particle gradient descent, which uses particles to represent a probabili...
research
05/12/2023

∂𝔹 nets: learning discrete functions by gradient descent

∂𝔹 nets are differentiable neural networks that learn discrete boolean-v...
research
04/25/2023

Learning Trajectories are Generalization Indicators

The aim of this paper is to investigate the connection between learning ...
research
10/25/2019

Learning Boolean Circuits with Neural Networks

Training neural-networks is computationally hard. However, in practice t...
research
10/05/2022

Spectral Regularization Allows Data-frugal Learning over Combinatorial Spaces

Data-driven machine learning models are being increasingly employed in s...

Please sign up or login with your details

Forgot password? Click here to reset