Sumformer: Universal Approximation for Efficient Transformers

07/05/2023
by   Silas Alberti, et al.
0

Natural language processing (NLP) made an impressive jump with the introduction of Transformers. ChatGPT is one of the most famous examples, changing the perception of the possibilities of AI even outside the research community. However, besides the impressive performance, the quadratic time and space complexity of Transformers with respect to sequence length pose significant limitations for handling long sequences. While efficient Transformer architectures like Linformer and Performer with linear complexity have emerged as promising solutions, their theoretical understanding remains limited. In this paper, we introduce Sumformer, a novel and simple architecture capable of universally approximating equivariant sequence-to-sequence functions. We use Sumformer to give the first universal approximation results for Linformer and Performer. Moreover, we derive a new proof for Transformers, showing that just one attention layer is sufficient for universal approximation.

READ FULL TEXT
research
05/26/2022

Your Transformer May Not be as Powerful as You Expect

Relative Positional Encoding (RPE), which encodes the relative distance ...
research
05/30/2023

Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input

Despite the great success of Transformer networks in various application...
research
12/20/2019

Are Transformers universal approximators of sequence-to-sequence functions?

Despite the widespread adoption of Transformer models for NLP tasks, the...
research
07/09/2020

Fast Transformers with Clustered Attention

Transformers have been proven a successful model for a variety of tasks ...
research
05/30/2023

Universality and Limitations of Prompt Tuning

Despite the demonstrated empirical efficacy of prompt tuning to adapt a ...
research
05/29/2023

Approximation theory of transformer networks for sequence modeling

The transformer is a widely applied architecture in sequence modeling ap...
research
03/07/2022

HyperMixer: An MLP-based Green AI Alternative to Transformers

Transformer-based architectures are the model of choice for natural lang...

Please sign up or login with your details

Forgot password? Click here to reset