Pay Attention to MLPs

05/17/2021
by   Hanxiao Liu, et al.
39

Transformers have become one of the most important architectural innovations in deep learning and have enabled many breakthroughs over the past few years. Here we propose a simple attention-free network architecture, gMLP, based solely on MLPs with gating, and show that it can perform as well as Transformers in key language and vision applications. Our comparisons show that self-attention is not critical for Vision Transformers, as gMLP can achieve the same accuracy. For BERT, our model achieves parity with Transformers on pretraining perplexity and is better on some downstream tasks. On finetuning tasks where gMLP performs worse, making the gMLP model substantially larger can close the gap with Transformers. In general, our experiments show that gMLP can scale as well as Transformers over increased data and compute.

READ FULL TEXT

page 5

page 14

page 15

research
04/26/2022

Understanding The Robustness in Vision Transformers

Recent studies show that Vision Transformers(ViTs) exhibit strong robust...
research
02/03/2023

PSST! Prosodic Speech Segmentation with Transformers

Self-attention mechanisms have enabled transformers to achieve superhuma...
research
04/28/2023

Representation Matters: The Game of Chess Poses a Challenge to Vision Transformers

While transformers have gained the reputation as the "Swiss army knife o...
research
06/09/2022

Unveiling Transformers with LEGO: a synthetic reasoning task

We propose a synthetic task, LEGO (Learning Equality and Group Operation...
research
03/07/2023

How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding

While the successes of transformers across many domains are indisputable...
research
09/11/2023

Uncovering mesa-optimization algorithms in Transformers

Transformers have become the dominant model in deep learning, but the re...
research
06/30/2020

Data Movement Is All You Need: A Case Study on Optimizing Transformers

Transformers have become widely used for language modeling and sequence ...

Please sign up or login with your details

Forgot password? Click here to reset