A case where a spindly two-layer linear network whips any neural network with a fully connected input layer

10/16/2020
by   Manfred K. Warmuth, et al.
0

It was conjectured that any neural network of any structure and arbitrary differentiable transfer functions at the nodes cannot learn the following problem sample efficiently when trained with gradient descent: The instances are the rows of a d-dimensional Hadamard matrix and the target is one of the features, i.e. very sparse. We essentially prove this conjecture: We show that after receiving a random training set of size k < d, the expected square loss is still 1-k/(d-1). The only requirement needed is that the input layer is fully connected and the initial weight vectors of the input nodes are chosen from a rotation invariant distribution. Surprisingly the same type of problem can be solved drastically more efficient by a simple 2-layer linear neural network in which the d inputs are connected to the output node by chains of length 2 (Now the input layer has only one edge per input). When such a network is trained by gradient descent, then it has been shown that its expected square loss is log d/k. Our lower bounds essentially show that a sparse input layer is needed to sample efficiently learn sparse targets with gradient descent when the number of examples is less than the number of input features.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/25/2022

An initial alignment between neural network and target is needed for gradient descent to learn

This paper introduces the notion of "Initial Alignment" (INAL) between a...
research
08/30/2022

On the universal consistency of an over-parametrized deep neural network estimate learned by gradient descent

Estimation of a multivariate regression function from independent and id...
research
07/19/2021

A quantum algorithm for training wide and deep classical neural networks

Given the success of deep learning in classical machine learning, quantu...
research
10/29/2020

What can we learn from gradients?

Recent work (<cit.>) has shown that it is possible to reconstruct the in...
research
06/29/2023

Provable Advantage of Curriculum Learning on Parity Targets with Mixed Inputs

Experimental results have shown that curriculum learning, i.e., presenti...
research
01/04/2022

Sparse Super-Regular Networks

It has been argued by Thom and Palm that sparsely-connected neural netwo...
research
01/06/2023

Grokking modular arithmetic

We present a simple neural network that can learn modular arithmetic tas...

Please sign up or login with your details

Forgot password? Click here to reset