Align then Summarize: Automatic Alignment Methods for Summarization Corpus Creation

07/15/2020
by   Paul Tardy, et al.
0

Summarizing texts is not a straightforward task. Before even considering text summarization, one should determine what kind of summary is expected. How much should the information be compressed? Is it relevant to reformulate or should the summary stick to the original phrasing? State-of-the-art on automatic text summarization mostly revolves around news articles. We suggest that considering a wider variety of tasks would lead to an improvement in the field, in terms of generalization and robustness. We explore meeting summarization: generating reports from automatic transcriptions. Our work consists in segmenting and aligning transcriptions with respect to reports, to get a suitable dataset for neural summarization. Using a bootstrapping approach, we provide pre-alignments that are corrected by human annotators, making a validation set against which we evaluate automatic models. This consistently reduces annotators' efforts by providing iteratively better pre-alignment and maximizes the corpus size by using annotations from our automatic alignment models. Evaluation is conducted on , a novel corpus of aligned public meetings. We report automatic alignment and summarization performances on this corpus and show that automatic alignment is relevant for data annotation since it leads to large improvement of almost +4 on all ROUGE scores on the summarization task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2020

Dataset for Automatic Summarization of Russian News

Automatic text summarization has been studied in a variety of domains an...
research
07/30/2020

Leverage Unlabeled Data for Abstractive Speech Summarization with Self-Supervised Learning and Back-Summarization

Supervised approaches for Neural Abstractive Summarization require large...
research
02/11/2016

Variations of the Similarity Function of TextRank for Automated Summarization

This article presents new alternatives to the similarity function for th...
research
05/04/2020

Exploring Content Selection in Summarization of Novel Chapters

We present a new summarization task, generating summaries of novel chapt...
research
05/04/2019

The method of automatic summarization from different sources

In this article is analyzed technology of automatic text abstracting and...
research
08/24/2020

A Baseline Analysis for Podcast Abstractive Summarization

Podcast summary, an important factor affecting end-users' listening deci...
research
01/26/2016

LIA-RAG: a system based on graphs and divergence of probabilities applied to Speech-To-Text Summarization

This paper aims to introduces a new algorithm for automatic speech-to-te...

Please sign up or login with your details

Forgot password? Click here to reset