Leveraging Large Text Corpora for End-to-End Speech Summarization

03/02/2023
by   Kohei Matsuura, et al.
0

End-to-end speech summarization (E2E SSum) is a technique to directly generate summary sentences from speech. Compared with the cascade approach, which combines automatic speech recognition (ASR) and text summarization models, the E2E approach is more promising because it mitigates ASR errors, incorporates nonverbal information, and simplifies the overall system. However, since collecting a large amount of paired data (i.e., speech and summary) is difficult, the training data is usually insufficient to train a robust E2E SSum system. In this paper, we present two novel methods that leverage a large amount of external text summarization data for E2E SSum training. The first technique is to utilize a text-to-speech (TTS) system to generate synthesized speech, which is used for E2E SSum training with the text summary. The second is a TTS-free method that directly inputs phoneme sequence instead of synthesized speech to the E2E SSum model. Experiments show that our proposed TTS- and phoneme-based methods improve several metrics on the How2 dataset. In particular, our best system outperforms a previous state-of-the-art one by a large margin (i.e., METEOR score improvements of more than 6 points). To the best of our knowledge, this is the first work to use external language resources for E2E SSum. Moreover, we report a detailed analysis of the How2 dataset to confirm the validity of our proposed E2E SSum system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2021

Attention-based Multi-hypothesis Fusion for Speech Summarization

Speech summarization, which generates a text summary from speech, can be...
research
06/07/2023

Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization

End-to-end speech summarization (E2E SSum) directly summarizes input spe...
research
09/14/2022

ESSumm: Extractive Speech Summarization from Untranscribed Meeting

In this paper, we propose a novel architecture for direct extractive spe...
research
03/15/2022

Differentiable Multi-Agent Actor-Critic for Multi-Step Radiology Report Summarization

The IMPRESSIONS section of a radiology report about an imaging study is ...
research
01/26/2016

LIA-RAG: a system based on graphs and divergence of probabilities applied to Speech-To-Text Summarization

This paper aims to introduces a new algorithm for automatic speech-to-te...
research
04/04/2020

End-to-End Abstractive Summarization for Meetings

With the abundance of automatic meeting transcripts, meeting summarizati...
research
09/11/2023

Minuteman: Machine and Human Joining Forces in Meeting Summarization

Many meetings require creating a meeting summary to keep everyone up to ...

Please sign up or login with your details

Forgot password? Click here to reset