On the Power of Saturated Transformers: A View from Circuit Complexity

06/30/2021
by   William Merrill, et al.
0

Transformers have become a standard architecture for many NLP problems. This has motivated theoretically analyzing their capabilities as models of language, in order to understand what makes them successful, and what their potential weaknesses might be. Recent work has shown that transformers with hard attention are quite limited in capacity, and in fact can be simulated by constant-depth circuits. However, hard attention is a restrictive assumption, which may complicate the relevance of these results for practical transformers. In this work, we analyze the circuit complexity of transformers with saturated attention: a generalization of hard attention that more closely captures the attention patterns learnable in practical transformers. We show that saturated transformers transcend the limitations of hard-attention transformers. With some minor assumptions, we prove that the number of bits needed to represent a saturated transformer memory vector is O(log n), which implies saturated transformers can be simulated by log-depth circuits. Thus, the jump from hard to saturated attention can be understood as increasing the transformer's effective circuit depth by a factor of O(log n).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/06/2023

Average-Hard Attention Transformers are Constant-Depth Uniform Threshold Circuits

Transformers have emerged as a widely used neural network model for vari...
research
04/28/2022

A Probabilistic Interpretation of Transformers

We propose a probabilistic interpretation of exponential dot product att...
research
07/02/2022

Log-Precision Transformers are Constant-Depth Uniform Threshold Circuits

We prove that transformer neural networks with logarithmic precision in ...
research
06/03/2023

Memorization Capacity of Multi-Head Attention in Transformers

In this paper, we investigate the memorization capabilities of multi-hea...
research
04/13/2022

Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity

This paper analyzes three formal models of Transformer encoders that dif...
research
10/06/2022

Transformers Can Be Expressed In First-Order Logic with Majority

Characterizing the implicit structure of the computation within neural n...
research
06/05/2023

Representational Strengths and Limitations of Transformers

Attention layers, as commonly used in transformers, form the backbone of...

Please sign up or login with your details

Forgot password? Click here to reset