Towards Understanding Learning in Neural Networks with Linear Teachers

01/07/2021
by   Roei Sarussi, et al.
16

Can a neural network minimizing cross-entropy learn linearly separable data? Despite progress in the theory of deep learning, this question remains unsolved. Here we prove that SGD globally optimizes this learning problem for a two-layer network with Leaky ReLU activations. The learned network can in principle be very complex. However, empirical evidence suggests that it often turns out to be approximately linear. We provide theoretical support for this phenomenon by proving that if network weights converge to two weight clusters, this will imply an approximately linear decision boundary. Finally, we show a condition on the optimization that leads to weight clustering. We provide empirical results that validate our theoretical analysis.

READ FULL TEXT

page 3

page 8

page 22

page 30

research
10/06/2018

Over-parameterization Improves Generalization in the XOR Detection Problem

Empirical evidence suggests that neural networks with ReLU activations g...
research
08/16/2018

On the Decision Boundary of Deep Neural Networks

While deep learning models and techniques have achieved great empirical ...
research
03/22/2022

Adaptative clustering by minimization of the mixing entropy criterion

We present a clustering method and provide a theoretical analysis and an...
research
11/10/2022

Regression as Classification: Influence of Task Formulation on Neural Network Features

Neural networks can be trained to solve regression problems by using gra...
research
10/26/2021

Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias

The generalization mystery of overparametrized deep nets has motivated e...
research
09/24/2020

How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks

We study how neural networks trained by gradient descent extrapolate, i....
research
06/04/2021

Can convolutional ResNets approximately preserve input distances? A frequency analysis perspective

ResNets constrained to be bi-Lipschitz, that is, approximately distance ...

Please sign up or login with your details

Forgot password? Click here to reset