FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention

08/05/2021
by   Tan M. Nguyen, et al.
21

We propose FMMformers, a class of efficient and flexible transformers inspired by the celebrated fast multipole method (FMM) for accelerating interacting particle simulation. FMM decomposes particle-particle interaction into near-field and far-field components and then performs direct and coarse-grained computation, respectively. Similarly, FMMformers decompose the attention into near-field and far-field attention, modeling the near-field attention by a banded matrix and the far-field attention by a low-rank matrix. Computing the attention matrix for FMMformers requires linear complexity in computational time and memory footprint with respect to the sequence length. In contrast, standard transformers suffer from quadratic complexity. We analyze and validate the advantage of FMMformers over the standard transformer on the Long Range Arena and language modeling benchmarks. FMMformers can even outperform the standard transformer in terms of accuracy by a significant margin. For instance, FMMformers achieve an average classification accuracy of 60.74% over the five Long Range Arena tasks, which is significantly better than the standard transformer's average accuracy of 58.70%.

READ FULL TEXT

page 3

page 4

page 18

research
07/05/2021

Long-Short Transformer: Efficient Transformers for Language and Vision

Transformers have achieved success in both language and vision domains. ...
research
10/21/2022

Diffuser: Efficient Transformers with Multi-hop Attention Diffusion for Long Sequences

Efficient Transformers have been developed for long sequence modeling, d...
research
07/07/2020

Do Transformers Need Deep Long-Range Memory

Deep attention models have advanced the modelling of sequential data acr...
research
05/11/2021

GNOME and LBM Model Evaluation on Ocean Oil Spill Far-Field Impacts to Highly Sensitive Areas

In case of an ocean oil spill, there are certain areas, e.g. shrimp farm...
research
10/28/2021

Scatterbrain: Unifying Sparse and Low-rank Attention Approximation

Recent advances in efficient Transformers have exploited either the spar...
research
08/01/2022

Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization

Transformers have achieved remarkable success in sequence modeling and b...
research
03/11/2022

Block-Recurrent Transformers

We introduce the Block-Recurrent Transformer, which applies a transforme...

Please sign up or login with your details

Forgot password? Click here to reset