DeepAI AI Chat
Log In Sign Up

Sparsifying Transformer Models with Differentiable Representation Pooling

09/10/2020
by   Michał Pietruszka, et al.
0

We propose a novel method to sparsify attention in the Transformer model by learning to select the most-informative token representations, thus leveraging the model's information bottleneck with twofold strength. A careful analysis shows that the contextualization of encoded representations in our model is significantly more effective than in the original Transformer. We achieve a notable reduction in memory usage due to an improved differentiable top-k operator, making the model suitable to process long documents, as shown on an example of a summarization task.

READ FULL TEXT

page 1

page 2

page 3

page 4

08/20/2021

Fastformer: Additive Attention Can Be All You Need

Transformer is a powerful model for text understanding. However, it is i...
03/09/2023

Efficient Transformer-based 3D Object Detection with Dynamic Token Halting

Balancing efficiency and accuracy is a long-standing problem for deployi...
10/06/2021

PoNet: Pooling Network for Efficient Token Mixing in Long Sequences

Transformer-based models have achieved great success in various NLP, vis...
06/02/2021

Enriching Transformers with Structured Tensor-Product Representations for Abstractive Summarization

Abstractive summarization, the task of generating a concise summary of i...
07/14/2022

Forming Trees with Treeformers

Popular models such as Transformers and LSTMs use tokens as its unit of ...
05/28/2021

MixerGAN: An MLP-Based Architecture for Unpaired Image-to-Image Translation

While attention-based transformer networks achieve unparalleled success ...