DeepAI AI Chat
Log In Sign Up

Balancedness and Alignment are Unlikely in Linear Neural Networks

We study the invariance properties of alignment in linear neural networks under gradient descent. Alignment of weight matrices is a form of implicit regularization, and previous works have studied this phenomenon in fully connected networks with 1-dimensional outputs. In such networks, we prove that there exists an initialization such that adjacent layers remain aligned throughout training under any real-valued loss function. We then define alignment for fully connected networks with multidimensional outputs and prove that it generally cannot be an invariant for such networks under the squared loss. Moreover, we characterize the datasets under which alignment is possible. We then analyze networks with layer constraints such as convolutional networks. In particular, we prove that gradient descent is equivalent to projected gradient descent, and show that alignment is impossible given sufficiently large datasets. Importantly, since our definition of alignment is a relaxation of balancedness, our negative results extend to this property.


page 1

page 2

page 3

page 4


Gradient descent aligns the layers of deep linear networks

This paper establishes risk convergence and asymptotic weight matrix ali...

On the non-universality of deep learning: quantifying the cost of symmetry

We prove computational limitations for learning with neural networks tra...

An initial alignment between neural network and target is needed for gradient descent to learn

This paper introduces the notion of "Initial Alignment" (INAL) between a...

Improved Convergence Guarantees for Shallow Neural Networks

We continue a long line of research aimed at proving convergence of dept...

Optimizing Neural Networks in the Equivalent Class Space

It has been widely observed that many activation functions and pooling m...

Regression-Based Image Alignment for General Object Categories

Gradient-descent methods have exhibited fast and reliable performance fo...

Exploiting Spline Models for the Training of Fully Connected Layers in Neural Network

The fully connected (FC) layer, one of the most fundamental modules in a...