Streaming automatic speech recognition with the transformer model

01/08/2020
by   Niko Moritz, et al.
0

Encoder-decoder based sequence-to-sequence models have demonstrated state-of-the-art results in end-to-end automatic speech recognition (ASR). Recently, the transformer architecture, which uses self-attention to model temporal context information, has been shown to achieve significantly lower word error rates (WERs) compared to recurrent neural network (RNN) based system architectures. Despite its success, the practical usage is limited to offline ASR tasks, since encoder-decoder architectures typically require an entire speech utterance as input. In this work, we propose a transformer based end-to-end ASR system for streaming ASR, where an output must be generated shortly after each spoken word. To achieve this, we apply time-restricted self-attention for the encoder and triggered attention for the encoder-decoder attention mechanism. Our proposed streaming transformer architecture achieves 2.7 which to the best of our knowledge is the best published streaming end-to-end ASR result for this task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/02/2021

Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition

Attention-based end-to-end automatic speech recognition (ASR) systems ha...
research
05/21/2020

Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition

Recently, streaming end-to-end automatic speech recognition (E2E-ASR) ha...
research
08/13/2020

Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-End Speech Recognition

Transformer has achieved competitive performance against state-of-the-ar...
research
11/03/2022

Streaming Audio-Visual Speech Recognition with Alignment Regularization

Recognizing a word shortly after it is spoken is an important requiremen...
research
06/17/2018

Extending Recurrent Neural Aligner for Streaming End-to-End Speech Recognition in Mandarin

End-to-end models have been showing superiority in Automatic Speech Reco...
research
12/06/2019

Semantic Mask for Transformer based End-to-End Speech Recognition

Attention-based encoder-decoder model has achieved impressive results fo...
research
03/29/2022

Shifted Chunk Encoder for Transformer Based Streaming End-to-End ASR

Currently, there are mainly three Transformer encoder based streaming En...

Please sign up or login with your details

Forgot password? Click here to reset