SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition

05/21/2020
by   Zhifu Gao, et al.
0

End-to-end speech recognition has become popular in recent years, since it can integrate the acoustic, pronunciation and language models into a single neural network. Among end-to-end approaches, attention-based methods have emerged as being superior. For example, Transformer, which adopts an encoder-decoder architecture. The key improvement introduced by Transformer is the utilization of self-attention instead of recurrent mechanisms, enabling both encoder and decoder to capture long-range dependencies with lower computational complexity.In this work, we propose boosting the self-attention ability with a DFSMN memory block, forming the proposed memory equipped self-attention (SAN-M) mechanism. Theoretical and empirical comparisons have been made to demonstrate the relevancy and complementarity between self-attention and the DFSMN memory block. Furthermore, the proposed SAN-M provides an efficient mechanism to integrate these two modules. We have evaluated our approach on the public AISHELL-1 benchmark and an industrial-level 20,000-hour Mandarin speech recognition task. On both tasks, SAN-M systems achieved much better performance than the self-attention based Transformer baseline system. Specially, it can achieve a CER of 6.46 AISHELL-1 task even without using any external LM, comfortably outperforming other state-of-the-art systems.

READ FULL TEXT
research
05/21/2020

Simplified Self-Attention for Transformer-based End-to-End Speech Recognition

Transformer models have been introduced into end-to-end speech recogniti...
research
01/22/2019

Self-Attention Networks for Connectionist Temporal Classification in Speech Recognition

Self-attention has demonstrated great success in sequence-to-sequence ta...
research
11/17/2020

s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis

Neural end-to-end text-to-speech (TTS) , which adopts either a recurrent...
research
02/01/2022

Natural Language to Code Using Transformers

We tackle the problem of generating code snippets from natural language ...
research
07/13/2020

Automatic Lyrics Transcription using Dilated Convolutional Neural Networks with Self-Attention

Speech recognition is a well developed research field so that the curren...
research
02/23/2021

Unidirectional Memory-Self-Attention Transducer for Online Speech Recognition

Self-attention models have been successfully applied in end-to-end speec...
research
01/04/2023

Infomaxformer: Maximum Entropy Transformer for Long Time-Series Forecasting Problem

The Transformer architecture yields state-of-the-art results in many tas...

Please sign up or login with your details

Forgot password? Click here to reset