Coinductive guide to inductive transformer heads

02/03/2023
by   Adam Nemecek, et al.
0

We argue that all building blocks of transformer models can be expressed with a single concept: combinatorial Hopf algebra. Transformer learning emerges as a result of the subtle interplay between the algebraic and coalgebraic operations of the combinatorial Hopf algebra. Viewed through this lens, the transformer model becomes a linear time-invariant system where the attention mechanism computes a generalized convolution transform and the residual stream serves as a unit impulse. Attention-only transformers then learn by enforcing an invariant between these two paths. We call this invariant Hopf coherence. Due to this, with a degree of poetic license, one could call combinatorial Hopf algebras "tensors with a built-in loss function gradient". This loss function gradient occurs within the single layers and no backward pass is needed. This is in contrast to automatic differentiation which happens across the whole graph and needs a explicit backward pass. This property is the result of the fact that combinatorial Hopf algebras have the surprising property of calculating eigenvalues by repeated squaring.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/30/2023

Looped Transformers as Programmable Computers

We present a framework for using transformer networks as universal compu...
research
02/14/2023

Energy Transformer

Transformers have become the de facto models of choice in machine learni...
research
02/28/2022

DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training

A standard hardware bottleneck when training deep neural networks is GPU...
research
12/04/2019

Differentiation of Blackbox Combinatorial Solvers

Achieving fusion of deep learning with combinatorial algorithms promises...
research
03/10/2023

Fast computation of permutation equivariant layers with the partition algebra

Linear neural network layers that are either equivariant or invariant to...
research
10/15/2019

Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving

We incorporate Tensor-Product Representations within the Transformer in ...

Please sign up or login with your details

Forgot password? Click here to reset