Hierarchical Learning for Generation with Long Source Sequences

04/15/2021
by   Tobias Rohde, et al.
0

One of the challenges for current sequence to sequence (seq2seq) models is processing long sequences, such as those in summarization and document level machine translation tasks. These tasks require the model to reason at the token level as well as the sentence and paragraph level. We design and study a new Hierarchical Attention Transformer-based architecture (HAT) that outperforms standard Transformers on several sequence to sequence tasks. In particular, our model achieves stateof-the-art results on four summarization tasks, including ArXiv, CNN/DM, SAMSum, and AMI, and we push PubMed R1 R2 SOTA further. Our model significantly outperforms our document-level machine translation baseline by 28 BLEU on the WMT19 EN-DE document translation task. We also investigate what the hierarchical layers learn by visualizing the hierarchical encoder-decoder attention. Finally, we study hierarchical learning on encoder-only pre-training and analyze its performance on classification downstream tasks.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 22

page 23

page 24

page 25

page 26

page 27

page 28

page 29

04/04/2020

STEP: Sequence-to-Sequence Transformer Pre-training for Document Summarization

Abstractive summarization aims to rewrite a long document to its shorter...
06/15/2020

DynE: Dynamic Ensemble Decoding for Multi-Document Summarization

Sequence-to-sequence (s2s) models are the basis for extensive work in na...
09/10/2021

Heterogeneous Graph Neural Networks for Keyphrase Generation

The encoder-decoder framework achieves state-of-the-art results in keyph...
04/24/2020

On Sparsifying Encoder Outputs in Sequence-to-Sequence Models

Sequence-to-sequence models usually transfer all encoder outputs to the ...
10/08/2019

Read, Highlight and Summarize: A Hierarchical Neural Semantic Encoder-based Approach

Traditional sequence-to-sequence (seq2seq) models and other variations o...
08/22/2019

Unsupervised Text Summarization via Mixed Model Back-Translation

Back-translation based approaches have recently lead to significant prog...
09/15/2021

Sequence Length is a Domain: Length-based Overfitting in Transformer Models

Transformer-based sequence-to-sequence architectures, while achieving st...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.