PairConnect: A Compute-Efficient MLP Alternative to Attention

06/15/2021
by   Zhaozhuo Xu, et al.
21

Transformer models have demonstrated superior performance in natural language processing. The dot product self-attention in Transformer allows us to model interactions between words. However, this modeling comes with significant computational overhead. In this work, we revisit the memory-compute trade-off associated with Transformer, particularly multi-head attention, and show a memory-heavy but significantly more compute-efficient alternative to Transformer. Our proposal, denoted as PairConnect, a multilayer perceptron (MLP), models the pairwise interaction between words by explicit pairwise word embeddings. As a result, PairConnect substitutes self dot product with a simple embedding lookup. We show mathematically that despite being an MLP, our compute-efficient PairConnect is strictly more expressive than Transformer. Our experiment on language modeling tasks suggests that PairConnect could achieve comparable results with Transformer while reducing the computational cost associated with inference significantly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2019

A Tensorized Transformer for Language Modeling

Latest development of neural models has connected the encoder and decode...
research
12/25/2019

Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection

Self-attention based Transformer has demonstrated the state-of-the-art p...
research
11/10/2019

Improving Transformer Models by Reordering their Sublayers

Multilayer transformer networks consist of interleaved self-attention an...
research
07/10/2019

Large Memory Layers with Product Keys

This paper introduces a structured memory which can be easily integrated...
research
08/07/2023

RCMHA: Relative Convolutional Multi-Head Attention for Natural Language Modelling

The Attention module finds common usage in language modeling, presenting...
research
09/17/2021

Primer: Searching for Efficient Transformers for Language Modeling

Large Transformer models have been central to recent advances in natural...
research
06/10/2021

GroupBERT: Enhanced Transformer Architecture with Efficient Grouped Structures

Attention based language models have become a critical component in stat...

Please sign up or login with your details

Forgot password? Click here to reset