Transformer in action: a comparative study of transformer-based acoustic models for large scale speech recognition applications

10/27/2020
by   Yongqiang Wang, et al.
0

In this paper, we summarize the application of transformer and its streamable variant, Emformer based acoustic model for large scale speech recognition applications. We compare the transformer based acoustic models with their LSTM counterparts on industrial scale tasks. Specifically, we compare Emformer with latency-controlled BLSTM (LCBLSTM) on medium latency tasks and LSTM on low latency tasks. On a low latency voice assistant task, Emformer gets 24 relative word error rate reductions (WERRs). For medium latency scenarios, comparing with LCBLSTM with similar model size and latency, Emformer gets significant WERR across four languages in video captioning datasets with 2-3 times inference real-time factors reduction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2020

Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition

This paper proposes an efficient memory transformer Emformer for low lat...
research
03/17/2020

High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model

While the community keeps promoting end-to-end models over conventional ...
research
10/07/2020

Super-Human Performance in Online Low-latency Recognition of Conversational Speech

Achieving super-human performance in recognizing human speech has been a...
research
05/08/2021

Latency-Controlled Neural Architecture Search for Streaming Speech Recognition

Recently, neural architecture search (NAS) has attracted much attention ...
research
02/27/2023

Low latency transformers for speech processing

The transformer is a widely-used building block in modern neural network...
research
11/02/2022

Variable Attention Masking for Configurable Transformer Transducer Speech Recognition

This work studies the use of attention masking in transformer transducer...
research
01/13/2017

Kernel Approximation Methods for Speech Recognition

We study large-scale kernel methods for acoustic modeling in speech reco...

Please sign up or login with your details

Forgot password? Click here to reset