Improving the Robustness of Summarization Systems with Dual Augmentation

06/01/2023
by   Xiuying Chen, et al.
0

A robust summarization system should be able to capture the gist of the document, regardless of the specific word choices or noise in the input. In this work, we first explore the summarization models' robustness against perturbations including word-level synonym substitution and noise. To create semantic-consistent substitutes, we propose a SummAttacker, which is an efficient approach to generating adversarial samples based on language models. Experimental results show that state-of-the-art summarization models have a significant decrease in performance on adversarial and noisy test sets. Next, we analyze the vulnerability of the summarization systems and explore improving the robustness by data augmentation. Specifically, the first brittleness factor we found is the poor understanding of infrequent words in the input. Correspondingly, we feed the encoder with more diverse cases created by SummAttacker in the input space. The other factor is in the latent space, where the attacked inputs bring more variations to the hidden states. Hence, we construct adversarial decoder input and devise manifold softmixing operation in hidden space to introduce more diversity. Experimental results on Gigaword and CNN/DM datasets demonstrate that our approach achieves significant improvements over strong baselines and exhibits higher robustness on noisy, attacked, and clean datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2019

Robust Neural Machine Translation with Doubly Adversarial Inputs

Neural machine translation (NMT) often suffers from the vulnerability to...
research
10/04/2022

Towards Improving Faithfulness in Abstractive Summarization

Despite the success achieved in neural abstractive summarization based o...
research
09/30/2019

A Closer Look at Data Bias in Neural Extractive Summarization Models

In this paper, we take stock of the current state of summarization datas...
research
12/20/2022

Improving the Robustness of Summarization Models by Detecting and Removing Input Noise

The evaluation of abstractive summarization models typically uses test d...
research
10/25/2018

Improving Document Binarization via Adversarial Noise-Texture Augmentation

Binarization of degraded document images is an elementary step in most o...
research
09/17/2021

Mitigating Data Scarceness through Data Synthesis, Augmentation and Curriculum for Abstractive Summarization

This paper explores three simple data manipulation techniques (synthesis...

Please sign up or login with your details

Forgot password? Click here to reset