New Approaches to Long Document Summarization: Fourier Transform Based Attention in a Transformer Model

11/25/2021
by   Andrew Kiruluta, et al.
0

In this work, we extensively redesign the newly introduced method of token mixing using Fourier Transforms (FNET) to replace the computationally expensive self-attention mechanism in a full transformer implementation on a long document summarization task (> 512 tokens). As a baseline, we also carried out long document summarization using established methods such as Longformer and Big Bird transformer models that are capable of processing over 8000 tokens and are currently the state of the art methods for these type of problems. The original FNET paper implemented this in an encoder only architecture while abstractive summarization requires both an encoder and a decoder. Since such a pretrained transformer model does not currently exist in the public domain, we decided to implement a full transformer based on this Fourier token mixing approach in an encoder/decoder architecture which we trained starting with Glove embeddings for the individual words in the corpus. We investigated a number of different extensions to the original FNET architecture and evaluated them on their Rouge F1-score performance on a summarization task. All modifications showed better performance on the summarization task than when using the original FNET encoder in a transformer architecture.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/08/2021

Sparsity and Sentence Structure in Encoder-Decoder Attention of Summarization Systems

Transformer models have achieved state-of-the-art results in a wide rang...
research
07/22/2021

FNetAR: Mixing Tokens with Autoregressive Fourier Transforms

In this note we examine the autoregressive generalization of the FNet al...
research
03/26/2020

TLDR: Token Loss Dynamic Reweighting for Reducing Repetitive Utterance Generation

Natural Language Generation (NLG) models are prone to generating repetit...
research
09/04/2023

One Wide Feedforward is All You Need

The Transformer architecture has two main non-embedding components: Atte...
research
05/02/2023

Unlimiformer: Long-Range Transformers with Unlimited Length Input

Transformer-based models typically have a predefined bound to their inpu...
research
09/10/2020

Sparsifying Transformer Models with Differentiable Representation Pooling

We propose a novel method to sparsify attention in the Transformer model...
research
06/11/2021

Zero-Shot Controlled Generation with Encoder-Decoder Transformers

Controlling neural network-based models for natural language generation ...

Please sign up or login with your details

Forgot password? Click here to reset