Hierarchical Learning for Generation with Long Source Sequences

04/15/2021
by   Tobias Rohde, et al.
0

One of the challenges for current sequence to sequence (seq2seq) models is processing long sequences, such as those in summarization and document level machine translation tasks. These tasks require the model to reason at the token level as well as the sentence and paragraph level. We design and study a new Hierarchical Attention Transformer-based architecture (HAT) that outperforms standard Transformers on several sequence to sequence tasks. In particular, our model achieves stateof-the-art results on four summarization tasks, including ArXiv, CNN/DM, SAMSum, and AMI, and we push PubMed R1 R2 SOTA further. Our model significantly outperforms our document-level machine translation baseline by 28 BLEU on the WMT19 EN-DE document translation task. We also investigate what the hierarchical layers learn by visualizing the hierarchical encoder-decoder attention. Finally, we study hierarchical learning on encoder-only pre-training and analyze its performance on classification downstream tasks.

READ FULL TEXT

page 22

page 23

page 24

page 25

page 26

page 27

page 28

page 29

research
04/04/2020

STEP: Sequence-to-Sequence Transformer Pre-training for Document Summarization

Abstractive summarization aims to rewrite a long document to its shorter...
research
10/16/2022

Modeling Context With Linear Attention for Scalable Document-Level Translation

Document-level machine translation leverages inter-sentence dependencies...
research
03/14/2023

Finding the Needle in a Haystack: Unsupervised Rationale Extraction from Long Text Classifiers

Long-sequence transformers are designed to improve the representation of...
research
04/24/2020

On Sparsifying Encoder Outputs in Sequence-to-Sequence Models

Sequence-to-sequence models usually transfer all encoder outputs to the ...
research
10/08/2019

Read, Highlight and Summarize: A Hierarchical Neural Semantic Encoder-based Approach

Traditional sequence-to-sequence (seq2seq) models and other variations o...
research
06/15/2020

DynE: Dynamic Ensemble Decoding for Multi-Document Summarization

Sequence-to-sequence (s2s) models are the basis for extensive work in na...
research
09/10/2021

Heterogeneous Graph Neural Networks for Keyphrase Generation

The encoder-decoder framework achieves state-of-the-art results in keyph...

Please sign up or login with your details

Forgot password? Click here to reset