Multi-head Monotonic Chunkwise Attention For Online Speech Recognition

05/01/2020
by   Baiji Liu, et al.
0

The attention mechanism of the Listen, Attend and Spell (LAS) model requires the whole input sequence to calculate the attention context and thus is not suitable for online speech recognition. To deal with this problem, we propose multi-head monotonic chunk-wise attention (MTH-MoChA), an improved version of MoChA. MTH-MoChA splits the input sequence into small chunks and computes multi-head attentions over the chunks. We also explore useful training strategies such as LSTM pooling, minimum world error rate training and SpecAugment to further improve the performance of MTH-MoChA. Experiments on AISHELL-1 data show that the proposed model, along with the training strategies, improve the character error rate (CER) of MoChA from 8.96 on test set. On another 18000 hours in-car speech data set, MTH-MoChA obtains 7.28

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/05/2017

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Attention-based encoder-decoder architectures such as Listen, Attend, an...
research
07/13/2023

Personalization for BERT-based Discriminative Speech Recognition Rescoring

Recognition of personalized content remains a challenge in end-to-end sp...
research
12/14/2017

Monotonic Chunkwise Attention

Sequence-to-sequence models with soft attention have been successfully a...
research
07/04/2022

Minimizing Sequential Confusion Error in Speech Command Recognition

Speech command recognition (SCR) has been commonly used on resource cons...
research
01/30/2021

Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet

In this work, a robust and efficient text-to-speech system, named Triple...
research
03/23/2023

Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for Mandarin Speech Recognition

As one of the major branches of automatic speech recognition, attention-...
research
09/13/2022

Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition

Attention layers are an integral part of modern end-to-end automatic spe...

Please sign up or login with your details

Forgot password? Click here to reset