Leveraging Word Embeddings for Spoken Document Summarization

06/14/2015
by   Kuan-Yu Chen, et al.
0

Owing to the rapidly growing multimedia content available on the Internet, extractive spoken document summarization, with the purpose of automatically selecting a set of representative sentences from a spoken document to concisely express the most important theme of the document, has been an active area of research and experimentation. On the other hand, word embedding has emerged as a newly favorite research subject because of its excellent performance in many natural language processing (NLP)-related tasks. However, as far as we are aware, there are relatively few studies investigating its use in extractive text or speech summarization. A common thread of leveraging word embeddings in the summarization process is to represent the document (or sentence) by averaging the word embeddings of the words occurring in the document (or sentence). Then, intuitively, the cosine similarity measure can be employed to determine the relevance degree between a pair of representations. Beyond the continued efforts made to improve the representation of words, this paper focuses on building novel and efficient ranking models based on the general word embedding methods for extractive speech summarization. Experimental results demonstrate the effectiveness of our proposed methods, compared to existing state-of-the-art methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/08/2018

Latent Semantic Analysis Approach for Document Summarization Based on Word Embeddings

Since the amount of information on the internet is growing rapidly, it i...
research
11/22/2016

Learning to Distill: The Essence Vector Modeling Framework

In the context of natural language processing, representation learning h...
research
01/20/2016

Improved Spoken Document Summarization with Coverage Modeling Techniques

Extractive summarization aims at selecting a set of indicative sentences...
research
07/22/2016

Novel Word Embedding and Translation-based Language Modeling for Extractive Speech Summarization

Word embedding methods revolve around learning continuous distributed ve...
research
09/07/2015

Integrate Document Ranking Information into Confidence Measure Calculation for Spoken Term Detection

This paper proposes an algorithm to improve the calculation of confidenc...
research
09/16/2020

Unsupervised Summarization by Jointly Extracting Sentences and Keywords

We present RepRank, an unsupervised graph-based ranking model for extrac...
research
11/25/2019

Visual Summarization of Scholarly Videos using Word Embeddings and Keyphrase Extraction

Effective learning with audiovisual content depends on many factors. Bes...

Please sign up or login with your details

Forgot password? Click here to reset