Flowformer: Linearizing Transformers with Conservation Flows

02/13/2022
by   Haixu Wu, et al.
8

Transformers based on the attention mechanism have achieved impressive success in various areas. However, the attention mechanism has a quadratic complexity, significantly impeding Transformers from dealing with numerous tokens and scaling up to bigger models. Previous methods mainly utilize the similarity decomposition and the associativity of matrix multiplication to devise linear-time attention mechanisms. They avoid degeneration of attention to a trivial distribution by reintroducing inductive biases such as the locality, thereby at the expense of model generality and expressiveness. In this paper, we linearize Transformers free from specific inductive biases based on the flow network theory. We cast attention as the information flow aggregated from the sources (values) to the sinks (results) through the learned flow capacities (attentions). Within this framework, we apply the property of flow conservation with attention and propose the Flow-Attention mechanism of linear complexity. By respectively conserving the incoming flow of sinks for source competition and the outgoing flow of sources for sink allocation, Flow-Attention inherently generates informative attentions without using specific inductive biases. Empowered by the Flow-Attention, Flowformer yields strong performance in linear time for wide areas, including long sequence, time series, vision, natural language, and reinforcement learning.

READ FULL TEXT

page 7

page 12

page 13

page 15

research
09/21/2022

Mega: Moving Average Equipped Gated Attention

The design choices in the Transformer attention mechanism, including wea...
research
06/24/2021

Exploring Corruption Robustness: Inductive Biases in Vision Transformers and MLP-Mixers

Recently, vision transformers and MLP-based models have been developed i...
research
05/27/2023

Graph Inductive Biases in Transformers without Message Passing

Transformers for graph data are increasingly widely studied and successf...
research
10/15/2022

Linear Video Transformer with Feature Fixation

Vision Transformers have achieved impressive performance in video classi...
research
05/30/2022

Attention Flows for General Transformers

In this paper, we study the computation of how much an input token in a ...
research
02/27/2021

Transformers with Competitive Ensembles of Independent Mechanisms

An important development in deep learning from the earliest MLPs has bee...
research
10/06/2021

Ripple Attention for Visual Perception with Sub-quadratic Complexity

Transformer architectures are now central to modeling in natural languag...

Please sign up or login with your details

Forgot password? Click here to reset