Monotonic segmental attention for automatic speech recognition

10/26/2022
by   Albert Zeyer, et al.
0

We introduce a novel segmental-attention model for automatic speech recognition. We restrict the decoder attention to segments to avoid quadratic runtime of global attention, better generalize to long sequences, and eventually enable streaming. We directly compare global-attention and different segmental-attention modeling variants. We develop and compare two separate time-synchronous decoders, one specifically taking the segmental nature into account, yielding further improvements. Using time-synchronous decoding for segmental models is novel and a step towards streaming applications. Our experiments show the importance of a length model to predict the segment boundaries. The final best segmental-attention model using segmental decoding performs better than global-attention, in contrast to other monotonic attention approaches in the literature. Further, we observe that the segmental model generalizes much better to long sequences of up to several minutes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/15/2021

VAD-free Streaming Hybrid CTC/Attention ASR for Unsegmented Recording

In this work, we propose novel decoding algorithms to enable streaming a...
research
03/30/2021

A study of latent monotonic attention variants

End-to-end models reach state-of-the-art performance for speech recognit...
research
05/19/2020

Enhancing Monotonic Multihead Attention for Streaming ASR

We investigate a monotonic multihead attention (MMA) by extending hard m...
research
05/10/2020

CTC-synchronous Training for Monotonic Attention Model

Monotonic chunkwise attention (MoChA) has been studied for the online st...
research
02/12/2020

Attentional Speech Recognition Models Misbehave on Out-of-domain Utterances

We discuss the problem of echographic transcription in autoregressive se...
research
04/13/2021

Equivalence of Segmental and Neural Transducer Modeling: A Proof of Concept

With the advent of direct models in automatic speech recognition (ASR), ...
research
05/20/2020

A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition

End-to-end models are gaining wider attention in the field of automatic ...

Please sign up or login with your details

Forgot password? Click here to reset