Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset

10/22/2020
by   Xie Chen, et al.
0

Recently, Transformer based end-to-end models have achieved great success in many areas including speech recognition. However, compared to LSTM models, the heavy computational cost of the Transformer during inference is a key issue to prevent their applications. In this work, we explored the potential of Transformer Transducer (T-T) models for the fist pass decoding with low latency and fast speed on a large-scale dataset. We combine the idea of Transformer-XL and chunk-wise streaming processing to design a streamable Transformer Transducer model. We demonstrate that T-T outperforms the hybrid model, RNN Transducer (RNN-T), and streamable Transformer attention-based encoder-decoder model in the streaming scenario. Furthermore, the runtime cost and latency can be optimized with a relatively small look-ahead.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2020

On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition

Recently, there has been a strong push to transition from hybrid models ...
research
08/30/2021

Multi-Channel Transformer Transducer for Speech Recognition

Multi-channel inputs offer several advantages over single-channel, to im...
research
03/14/2023

I3D: Transformer architectures with input-dependent dynamic depth for speech recognition

Transformer-based end-to-end speech recognition has achieved great succe...
research
11/16/2022

Streaming Joint Speech Recognition and Disfluency Detection

Disfluency detection has mainly been solved in a pipeline approach, as p...
research
11/02/2022

Variable Attention Masking for Configurable Transformer Transducer Speech Recognition

This work studies the use of attention masking in transformer transducer...
research
06/22/2022

Answer Fast: Accelerating BERT on the Tensor Streaming Processor

Transformers have become a predominant machine learning workload, they a...
research
10/06/2022

WakeUpNet: A Mobile-Transformer based Framework for End-to-End Streaming Voice Trigger

End-to-end models have gradually become the main technical stream for vo...

Please sign up or login with your details

Forgot password? Click here to reset