Combining Word Embeddings and N-grams for Unsupervised Document Summarization

04/25/2020
by   Zhuolin Jiang, et al.
6

Graph-based extractive document summarization relies on the quality of the sentence similarity graph. Bag-of-words or tf-idf based sentence similarity uses exact word matching, but fails to measure the semantic similarity between individual words or to consider the semantic structure of sentences. In order to improve the similarity measure between sentences, we employ off-the-shelf deep embedding features and tf-idf features, and introduce a new text similarity metric. An improved sentence similarity graph is built and used in a submodular objective function for extractive summarization, which consists of a weighted coverage term and a diversity term. A Transformer based compression model is developed for sentence compression to aid in document summarization. Our summarization approach is extractive and unsupervised. Experiments demonstrate that our approach can outperform the tf-idf based approach and achieve state-of-the-art performance on the DUC04 dataset, and comparable performance to the fully supervised learning methods on the CNN/DM and NYT datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2021

Centrality Meets Centroid: A Graph-based Approach for Unsupervised Document Summarization

Unsupervised document summarization has re-acquired lots of attention in...
research
09/16/2020

Unsupervised Summarization by Jointly Extracting Sentences and Keywords

We present RepRank, an unsupervised graph-based ranking model for extrac...
research
05/14/2018

Unsupervised Abstractive Meeting Summarization with Multi-Sentence Compression and Budgeted Submodular Maximization

We introduce a novel graph-based framework for abstractive meeting speec...
research
03/22/2018

Context is Everything: Finding Meaning Statistically in Semantic Spaces

This paper introduces a simple and explicit measure of word importance i...
research
10/06/2020

SupMMD: A Sentence Importance Model for Extractive Summarization using Maximum Mean Discrepancy

Most work on multi-document summarization has focused on generic summari...
research
05/07/2016

On Improving Informativity and Grammaticality for Multi-Sentence Compression

Multi Sentence Compression (MSC) is of great value to many real world ap...
research
08/08/2023

A Comparative Study of Sentence Embedding Models for Assessing Semantic Variation

Analyzing the pattern of semantic variation in long real-world texts suc...

Please sign up or login with your details

Forgot password? Click here to reset