Optimal inference of a generalised Potts model by single-layer transformers with factored attention

04/14/2023
by   Riccardo Rende, et al.
0

Transformers are the type of neural networks that has revolutionised natural language processing and protein science. Their key building block is a mechanism called self-attention which is trained to predict missing words in sentences. Despite the practical success of transformers in applications it remains unclear what self-attention learns from data, and how. Here, we give a precise analytical and numerical characterisation of transformers trained on data drawn from a generalised Potts model with interactions between sites and Potts colours. While an off-the-shelf transformer requires several layers to learn this distribution, we show analytically that a single layer of self-attention with a small modification can learn the Potts model exactly in the limit of infinite sampling. We show that this modified self-attention, that we call “factored”, has the same functional form as the conditional probability of a Potts spin given the other spins, compute its generalisation error using the replica method from statistical physics, and derive an exact mapping to pseudo-likelihood methods for solving the inverse Ising and Potts problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/26/2023

Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?

Existing analyses of the expressive capacity of Transformer models have ...
research
01/20/2021

Classifying Scientific Publications with BERT – Is Self-Attention a Feature Selection Method?

We investigate the self-attention mechanism of BERT in a fine-tuning sce...
research
03/07/2023

How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding

While the successes of transformers across many domains are indisputable...
research
02/17/2021

Centroid Transformers: Learning to Abstract with Attention

Self-attention, as the key block of transformers, is a powerful mechanis...
research
03/27/2023

Evaluating self-attention interpretability through human-grounded experimental protocol

Attention mechanisms have played a crucial role in the development of co...
research
09/13/2023

Traveling Words: A Geometric Interpretation of Transformers

Transformers have significantly advanced the field of natural language p...
research
02/20/2023

Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation

Skip connections and normalisation layers form two standard architectura...

Please sign up or login with your details

Forgot password? Click here to reset