Speech Summarization using Restricted Self-Attention

10/12/2021
by   Roshan Sharma, et al.
0

Speech summarization is typically performed by using a cascade of speech recognition and text summarization models. End-to-end modeling of speech summarization models is challenging due to memory and compute constraints arising from long input audio sequences. Recent work in document summarization has inspired methods to reduce the complexity of self-attentions, which enables transformer models to handle long sequences. In this work, we introduce a single model optimized end-to-end for speech summarization. We apply the restricted self-attention technique from text-based models to speech models to address the memory and compute constraints. We demonstrate that the proposed model learns to directly summarize speech for the How-2 corpus of instructional videos. The proposed end-to-end model outperforms the previously proposed cascaded model by 3 points absolute on ROUGE. Further, we consider the spoken language understanding task of predicting concepts from speech inputs and show that the proposed end-to-end model outperforms the cascade model by 4 points absolute F-1.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2022

XNOR-FORMER: Learning Accurate Approximations in Long Speech Transformers

Transformers are among the state of the art for many tasks in speech, vi...
research
07/17/2023

BASS: Block-wise Adaptation for Speech Summarization

End-to-end speech summarization has been shown to improve performance ov...
research
02/23/2021

Unidirectional Memory-Self-Attention Transducer for Online Speech Recognition

Self-attention models have been successfully applied in end-to-end speec...
research
05/25/2022

Leveraging Locality in Abstractive Text Summarization

Despite the successes of neural attention models for natural language ge...
research
08/27/2019

Movie Plot Analysis via Turning Point Identification

According to screenwriting theory, turning points (e.g., change of plans...
research
06/06/2023

Towards End-to-end Speech-to-text Summarization

Speech-to-text (S2T) summarization is a time-saving technique for filter...
research
07/06/2022

Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding

Conformer has proven to be effective in many speech processing tasks. It...

Please sign up or login with your details

Forgot password? Click here to reset