Enhancing Monotonic Multihead Attention for Streaming ASR

05/19/2020
by   Hirofumi Inaguma, et al.
0

We investigate a monotonic multihead attention (MMA) by extending hard monotonic attention to Transformer-based automatic speech recognition (ASR) for online streaming applications. For streaming inference, all monotonic attention (MA) heads should learn proper alignments because the next token is not generated until all heads detect the corresponding token boundaries. However, we found not all MA heads learn alignments with a naive implementation. To encourage every head to learn alignments properly, we propose HeadDrop regularization by masking out a part of heads stochastically during training. Furthermore, we propose to prune redundant heads to improve consensus among heads for boundary detection and prevent delayed token generation caused by such heads. Chunkwise attention on each MA head is extended to the multihead counterpart. Finally, we propose head-synchronous beam search decoding to guarantee stable streaming inference.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2022

Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition

The Transformer architecture model, based on self-attention and multi-he...
research
03/26/2021

Mutually-Constrained Monotonic Multihead Attention for Online ASR

Despite the feature of real-time decoding, Monotonic Multihead Attention...
research
05/10/2020

CTC-synchronous Training for Monotonic Attention Model

Monotonic chunkwise attention (MoChA) has been studied for the online st...
research
07/01/2021

StableEmit: Selection Probability Discount for Reducing Emission Latency of Streaming Monotonic Attention ASR

While attention-based encoder-decoder (AED) models have been successfull...
research
07/15/2021

VAD-free Streaming Hybrid CTC/Attention ASR for Unsegmented Recording

In this work, we propose novel decoding algorithms to enable streaming a...
research
10/26/2022

Monotonic segmental attention for automatic speech recognition

We introduce a novel segmental-attention model for automatic speech reco...
research
03/11/2022

Transformer-based Streaming ASR with Cumulative Attention

In this paper, we propose an online attention mechanism, known as cumula...

Please sign up or login with your details

Forgot password? Click here to reset