Hierarchical Learning for Generation with Long Source Sequences

by   Tobias Rohde, et al.

One of the challenges for current sequence to sequence (seq2seq) models is processing long sequences, such as those in summarization and document level machine translation tasks. These tasks require the model to reason at the token level as well as the sentence and paragraph level. We design and study a new Hierarchical Attention Transformer-based architecture (HAT) that outperforms standard Transformers on several sequence to sequence tasks. In particular, our model achieves stateof-the-art results on four summarization tasks, including ArXiv, CNN/DM, SAMSum, and AMI, and we push PubMed R1 R2 SOTA further. Our model significantly outperforms our document-level machine translation baseline by 28 BLEU on the WMT19 EN-DE document translation task. We also investigate what the hierarchical layers learn by visualizing the hierarchical encoder-decoder attention. Finally, we study hierarchical learning on encoder-only pre-training and analyze its performance on classification downstream tasks.



There are no comments yet.


page 22

page 23

page 24

page 25

page 26

page 27

page 28

page 29


STEP: Sequence-to-Sequence Transformer Pre-training for Document Summarization

Abstractive summarization aims to rewrite a long document to its shorter...

DynE: Dynamic Ensemble Decoding for Multi-Document Summarization

Sequence-to-sequence (s2s) models are the basis for extensive work in na...

Heterogeneous Graph Neural Networks for Keyphrase Generation

The encoder-decoder framework achieves state-of-the-art results in keyph...

On Sparsifying Encoder Outputs in Sequence-to-Sequence Models

Sequence-to-sequence models usually transfer all encoder outputs to the ...

Read, Highlight and Summarize: A Hierarchical Neural Semantic Encoder-based Approach

Traditional sequence-to-sequence (seq2seq) models and other variations o...

Unsupervised Text Summarization via Mixed Model Back-Translation

Back-translation based approaches have recently lead to significant prog...

Sequence Length is a Domain: Length-based Overfitting in Transformer Models

Transformer-based sequence-to-sequence architectures, while achieving st...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.