Leveraging Word Embeddings for Spoken Document Summarization

by   Kuan-Yu Chen, et al.

Owing to the rapidly growing multimedia content available on the Internet, extractive spoken document summarization, with the purpose of automatically selecting a set of representative sentences from a spoken document to concisely express the most important theme of the document, has been an active area of research and experimentation. On the other hand, word embedding has emerged as a newly favorite research subject because of its excellent performance in many natural language processing (NLP)-related tasks. However, as far as we are aware, there are relatively few studies investigating its use in extractive text or speech summarization. A common thread of leveraging word embeddings in the summarization process is to represent the document (or sentence) by averaging the word embeddings of the words occurring in the document (or sentence). Then, intuitively, the cosine similarity measure can be employed to determine the relevance degree between a pair of representations. Beyond the continued efforts made to improve the representation of words, this paper focuses on building novel and efficient ranking models based on the general word embedding methods for extractive speech summarization. Experimental results demonstrate the effectiveness of our proposed methods, compared to existing state-of-the-art methods.


page 1

page 2

page 3

page 4


Latent Semantic Analysis Approach for Document Summarization Based on Word Embeddings

Since the amount of information on the internet is growing rapidly, it i...

Learning to Distill: The Essence Vector Modeling Framework

In the context of natural language processing, representation learning h...

Improved Spoken Document Summarization with Coverage Modeling Techniques

Extractive summarization aims at selecting a set of indicative sentences...

Novel Word Embedding and Translation-based Language Modeling for Extractive Speech Summarization

Word embedding methods revolve around learning continuous distributed ve...

Integrate Document Ranking Information into Confidence Measure Calculation for Spoken Term Detection

This paper proposes an algorithm to improve the calculation of confidenc...

Unsupervised Summarization by Jointly Extracting Sentences and Keywords

We present RepRank, an unsupervised graph-based ranking model for extrac...

Visual Summarization of Scholarly Videos using Word Embeddings and Keyphrase Extraction

Effective learning with audiovisual content depends on many factors. Bes...

Please sign up or login with your details

Forgot password? Click here to reset