Deep-FSMN for Large Vocabulary Continuous Speech Recognition

03/04/2018
by   Shiliang Zhang, et al.
0

In this paper, we present an improved feedforward sequential memory networks (FSMN) architecture, namely Deep-FSMN (DFSMN), by introducing skip connections between memory blocks in adjacent layers. These skip connections enable the information flow across different layers and thus alleviate the gradient vanishing problem when building very deep structure. As a result, DFSMN significantly benefits from these skip connections and deep structure. We have compared the performance of DFSMN to BLSTM both with and without lower frame rate (LFR) on several large speech recognition tasks, including English and Mandarin. Experimental results shown that DFSMN can consistently outperform BLSTM with dramatic gain, especially trained with LFR using CD-Phone as modeling units. In the 2000 hours Fisher (FSH) task, the proposed DFSMN can achieve a word error rate of 9.4 and decoding with a 3-gram language model, which achieves a 1.5 improvement compared to the BLSTM. In a 20000 hours Mandarin recognition task, the LFR trained DFSMN can achieve more than 20 to the LFR trained BLSTM. Moreover, we can easily design the lookahead filter order of the memory blocks in DFSMN to control the latency for real-time applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2016

Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition

We present results that show it is possible to build a competitive, grea...
research
10/30/2015

Highway Long Short-Term Memory RNNs for Distant Speech Recognition

In this paper, we extend the deep long short-term memory (DLSTM) recurre...
research
10/26/2018

A novel pyramidal-FSMN architecture with lattice-free MMI for speech recognition

Deep Feedforward Sequential Memory Network (DFSMN) has shown superior pe...
research
03/21/2017

Deep LSTM for Large Vocabulary Continuous Speech Recognition

Recurrent neural networks (RNNs), especially long short-term memory (LST...
research
04/30/2018

Towards Deeper Generative Architectures for GANs using Dense connections

In this paper, we present the result of adopting skip connections and de...
research
07/21/2020

Very Fast Keyword Spotting System with Real Time Factor below 0.01

In the paper we present an architecture of a keyword spotting (KWS) syst...
research
01/18/2021

Tiny Transducer: A Highly-efficient Speech Recognition Model on Edge Devices

This paper proposes an extremely lightweight phone-based transducer mode...

Please sign up or login with your details

Forgot password? Click here to reset