Improved Spoken Document Summarization with Coverage Modeling Techniques

by   Kuan-Yu Chen, et al.

Extractive summarization aims at selecting a set of indicative sentences from a source document as a summary that can express the major theme of the document. A general consensus on extractive summarization is that both relevance and coverage are critical issues to address. The existing methods designed to model coverage can be characterized by either reducing redundancy or increasing diversity in the summary. Maximal margin relevance (MMR) is a widely-cited method since it takes both relevance and redundancy into account when generating a summary for a given document. In addition to MMR, there is only a dearth of research concentrating on reducing redundancy or increasing diversity for the spoken document summarization task, as far as we are aware. Motivated by these observations, two major contributions are presented in this paper. First, in contrast to MMR, which considers coverage by reducing redundancy, we propose two novel coverage-based methods, which directly increase diversity. With the proposed methods, a set of representative sentences, which not only are relevant to the given document but also cover most of the important sub-themes of the document, can be selected automatically. Second, we make a step forward to plug in several document/sentence representation methods into the proposed framework to further enhance the summarization performance. A series of empirical evaluations demonstrate the effectiveness of our proposed methods.


page 1

page 2

page 3

page 4


AREDSUM: Adaptive Redundancy-Aware Iterative Sentence Ranking for Extractive Document Summarization

Redundancy-aware extractive summarization systems score the redundancy o...

Leveraging Word Embeddings for Spoken Document Summarization

Owing to the rapidly growing multimedia content available on the Interne...

At Which Level Should We Extract? An Empirical Study on Extractive Document Summarization

Extractive methods have proven to be very effective in automatic documen...

A Training-free and Reference-free Summarization Evaluation Metric via Centrality-weighted Relevance and Self-referenced Redundancy

In recent years, reference-based and supervised summarization evaluation...

Learning to Distill: The Essence Vector Modeling Framework

In the context of natural language processing, representation learning h...

Systematically Exploring Redundancy Reduction in Summarizing Long Documents

Our analysis of large summarization datasets indicates that redundancy i...

Classify or Select: Neural Architectures for Extractive Document Summarization

We present two novel and contrasting Recurrent Neural Network (RNN) base...

Please sign up or login with your details

Forgot password? Click here to reset