DFSMN-SAN with Persistent Memory Model for Automatic Speech Recognition

10/28/2019
by   Zhao You, et al.
0

Self-attention networks (SAN) have been introduced into automatic speech recognition (ASR) and achieved state-of-the-art performance owing to its superior ability in capturing long term dependency. One of the key ingredients is the self-attention mechanism which can be effectively performed on the whole utterance level. In this paper, we try to investigate whether even more information beyond the whole utterance level can be exploited and beneficial. We propose to apply self-attention layer with augmented memory to ASR. Specifically, we first propose a variant model architecture which combines deep feed-forward sequential memory network (DFSMN) with self-attention layers to form a better baseline model compared with a purely self-attention network. Then, we propose and compare two kinds of additional memory structures added into self-attention layers. Experiments on large-scale LVCSR tasks show that on four individual test sets, the DFSMN-SAN architecture outperforms vanilla SAN encoder by 5 additional memory structure provides further 5 CER.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2021

Efficient Conformer with Prob-Sparse Attention Mechanism for End-to-EndSpeech Recognition

End-to-end models are favored in automatic speech recognition (ASR) beca...
research
07/12/2023

Sumformer: A Linear-Complexity Alternative to Self-Attention for Speech Recognition

Modern speech recognition systems rely on self-attention. Unfortunately,...
research
03/22/2023

Self-supervised Learning with Speech Modulation Dropout

We show that training a multi-headed self-attention-based deep network t...
research
04/07/2021

Capturing Multi-Resolution Context by Dilated Self-Attention

Self-attention has become an important and widely used neural network co...
research
05/28/2020

When Can Self-Attention Be Replaced by Feed Forward Layers?

Recently, self-attention models such as Transformers have given competit...
research
11/14/2022

Towards A Unified Conformer Structure: from ASR to ASV Task

Transformer has achieved extraordinary performance in Natural Language P...
research
09/01/2022

Deep Sparse Conformer for Speech Recognition

Conformer has achieved impressive results in Automatic Speech Recognitio...

Please sign up or login with your details

Forgot password? Click here to reset