Average-Hard Attention Transformers are Constant-Depth Uniform Threshold Circuits

08/06/2023
by   Lena Strobl, et al.
0

Transformers have emerged as a widely used neural network model for various natural language processing tasks. Previous research explored their relationship with constant-depth threshold circuits, making two assumptions: average-hard attention and logarithmic precision for internal computations relative to input length. Merrill et al. (2022) prove that average-hard attention transformers recognize languages that fall within the complexity class TC0, denoting the set of languages that can be recognized by constant-depth polynomial-size threshold circuits. Likewise, Merrill and Sabharwal (2023) show that log-precision transformers recognize languages within the class of uniform TC0. This shows that both transformer models can be simulated by constant-depth threshold circuits, with the latter being more robust due to generating a uniform circuit family. Our paper shows that the first result can be extended to yield uniform circuits as well.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/02/2022

Log-Precision Transformers are Constant-Depth Uniform Threshold Circuits

We prove that transformer neural networks with logarithmic precision in ...
research
04/13/2022

Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity

This paper analyzes three formal models of Transformer encoders that dif...
research
06/30/2021

On the Power of Saturated Transformers: A View from Circuit Complexity

Transformers have become a standard architecture for many NLP problems. ...
research
07/02/2019

Efficient Circuit Simulation in MapReduce

The MapReduce framework has firmly established itself as one of the most...
research
10/06/2022

Transformers Can Be Expressed In First-Order Logic with Majority

Characterizing the implicit structure of the computation within neural n...
research
09/16/2022

Quantum Vision Transformers

We design and analyse quantum transformers, extending the state-of-the-a...
research
05/05/2023

Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation

Unlike recurrent models, conventional wisdom has it that Transformers ca...

Please sign up or login with your details

Forgot password? Click here to reset