TST: Time-Sparse Transducer for Automatic Speech Recognition

07/17/2023
by   Xiaohui Zhang, et al.
0

End-to-end model, especially Recurrent Neural Network Transducer (RNN-T), has achieved great success in speech recognition. However, transducer requires a great memory footprint and computing time when processing a long decoding sequence. To solve this problem, we propose a model named time-sparse transducer, which introduces a time-sparse mechanism into transducer. In this mechanism, we obtain the intermediate representations by reducing the time resolution of the hidden states. Then the weighted average algorithm is used to combine these representations into sparse hidden states followed by the decoder. All the experiments are conducted on a Mandarin dataset AISHELL-1. Compared with RNN-T, the character error rate of the time-sparse transducer is close to RNN-T and the real-time factor is 50.00 the time resolution, the time-sparse transducer can also reduce the real-time factor to 16.54

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/25/2016

Character-Level Incremental Speech Recognition with Recurrent Neural Networks

In real-time speech recognition applications, the latency is an importan...
research
09/28/2019

Self-Attention Transducers for End-to-End Speech Recognition

Recurrent neural network transducers (RNN-T) have been successfully appl...
research
04/26/2020

Research on Modeling Units of Transformer Transducer for Mandarin Speech Recognition

Modeling unit and model architecture are two key factors of Recurrent Ne...
research
08/18/2015

End-to-End Attention-based Large Vocabulary Speech Recognition

Many of the current state-of-the-art Large Vocabulary Continuous Speech ...
research
11/28/2019

Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition

In this work, we propose minimum Bayes risk (MBR) training of RNN-Transd...
research
07/23/2020

Applying GPGPU to Recurrent Neural Network Language Model based Fast Network Search in the Real-Time LVCSR

Recurrent Neural Network Language Models (RNNLMs) have started to be use...
research
03/08/2021

An Ultra-low Power RNN Classifier for Always-On Voice Wake-Up Detection Robust to Real-World Scenarios

We present in this paper an ultra-low power (ULP) Recurrent Neural Netwo...

Please sign up or login with your details

Forgot password? Click here to reset