Unidirectional Memory-Self-Attention Transducer for Online Speech Recognition

02/23/2021
by   Jian Luo, et al.
0

Self-attention models have been successfully applied in end-to-end speech recognition systems, which greatly improve the performance of recognition accuracy. However, such attention-based models cannot be used in online speech recognition, because these models usually have to utilize a whole acoustic sequences as inputs. A common method is restricting the field of attention sights by a fixed left and right window, which makes the computation costs manageable yet also introduces performance degradation. In this paper, we propose Memory-Self-Attention (MSA), which adds history information into the Restricted-Self-Attention unit. MSA only needs localtime features as inputs, and efficiently models long temporal contexts by attending memory states. Meanwhile, recurrent neural network transducer (RNN-T) has proved to be a great approach for online ASR tasks, because the alignments of RNN-T are local and monotonic. We propose a novel network structure, called Memory-Self-Attention (MSA) Transducer. Both encoder and decoder of the MSA Transducer contain the proposed MSA unit. The experiments demonstrate that our proposed models improve WER results than Restricted-Self-Attention models by 13.5 on WSJ and7.1 on SWBD datasets relatively, and without much computation costs increase.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/28/2019

Self-Attention Transducers for End-to-End Speech Recognition

Recurrent neural network transducers (RNN-T) have been successfully appl...
research
05/21/2020

SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition

End-to-end speech recognition has become popular in recent years, since ...
research
07/12/2023

Sumformer: A Linear-Complexity Alternative to Self-Attention for Speech Recognition

Modern speech recognition systems rely on self-attention. Unfortunately,...
research
02/07/2020

Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss

In this paper we present an end-to-end speech recognition model with Tra...
research
10/12/2021

Speech Summarization using Restricted Self-Attention

Speech summarization is typically performed by using a cascade of speech...
research
06/04/2019

Self-Attentional Models for Lattice Inputs

Lattices are an efficient and effective method to encode ambiguity of up...
research
05/18/2020

Attention-based Transducer for Online Speech Recognition

Recent studies reveal the potential of recurrent neural network transduc...

Please sign up or login with your details

Forgot password? Click here to reset