Transformers learn through gradual rank increase

06/12/2023
by   Enric Boix-Adserà, et al.
0

We identify incremental learning dynamics in transformers, where the difference between trained and initial weights progressively increases in rank. We rigorously prove this occurs under the simplifying assumptions of diagonal weight matrices and small initialization. Our experiments support the theory and also show that phenomenon can occur in practice without the simplifying assumptions.

READ FULL TEXT
research
06/20/2023

InRank: Incremental Low-Rank Learning

The theory of greedy low-rank learning (GLRL) aims to explain the impres...
research
04/04/2023

Effective Theory of Transformers at Initialization

We perform an effective-theory analysis of forward-backward signal propa...
research
06/07/2022

Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse

Transformers have achieved remarkable success in several domains, rangin...
research
09/26/2019

The Implicit Bias of Depth: How Incremental Learning Drives Generalization

A leading hypothesis for the surprising generalization of neural network...
research
11/06/2020

Extending Equational Monadic Reasoning with Monad Transformers

There is a recent interest for the verification of monadic programs usin...
research
12/03/2021

Linear algebra with transformers

Most applications of transformers to mathematics, from integration to th...
research
04/02/2023

Saddle-to-Saddle Dynamics in Diagonal Linear Networks

In this paper we fully describe the trajectory of gradient flow over dia...

Please sign up or login with your details

Forgot password? Click here to reset