Abstractive Document Summarization without Parallel Data
Abstractive summarization typically relies on large collections of paired articles and summaries, however parallel data is scarce and costly to obtain. We develop an abstractive summarization system that only relies on having access to large collections of example summaries and non-matching articles. Our approach consists of an unsupervised sentence extractor, which selects salient sentences to include in the final summary; as well as a sentence abstractor, trained using pseudo-parallel and synthetic data, which paraphrases each of the extracted sentences. We achieve promising results on the CNN/DailyMail benchmark without relying on any article-summary pairs.
READ FULL TEXT