Exploring Transformers for Large-Scale Speech Recognition

05/19/2020
by   Liang Lu, et al.
0

While recurrent neural networks still largely define state-of-the-art speech recognition systems, the Transformer network has been proven to be a competitive alternative, especially in the offline condition. Most studies with Transformers have been constrained in a relatively small scale setting, and some forms of data argumentation approaches are usually applied to combat the data sparsity issue. In this paper, we aim at understanding the behaviors of Transformers in the large-scale speech recognition setting, where we have used around 65,000 hours of training data. We investigated various aspects on scaling up Transformers, including model initialization, warmup training as well as different Layer Normalization strategies. In the streaming condition, we compared the widely used attention mask based future context lookahead approach to the Transformer-XL network. From our experiments, we show that Transformers can achieve around 6 compared to the BLSTM baseline in the offline fashion, while in the streaming fashion, Transformer-XL is comparable to LC-BLSTM with 800 millisecond latency constraint.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2023

Low latency transformers for speech processing

The transformer is a widely-used building block in modern neural network...
research
10/22/2019

Transformer-based Acoustic Modeling for Hybrid Speech Recognition

We propose and evaluate transformer-based acoustic models (AMs) for hybr...
research
06/21/2023

EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations

Equivariant Transformers such as Equiformer have demonstrated the effica...
research
10/29/2022

XNOR-FORMER: Learning Accurate Approximations in Long Speech Transformers

Transformers are among the state of the art for many tasks in speech, vi...
research
11/26/2020

Streaming end-to-end multi-talker speech recognition

End-to-end multi-talker speech recognition is an emerging research trend...
research
10/24/2019

An Empirical Study of Efficient ASR Rescoring with Transformers

Neural language models (LMs) have been proved to significantly outperfor...
research
11/25/2021

Wake Word Detection with Streaming Transformers

Modern wake word detection systems usually rely on neural networks for a...

Please sign up or login with your details

Forgot password? Click here to reset