Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition

11/03/2020
by   Ching-Feng Yeh, et al.
0

Attention-based models have been gaining popularity recently for their strong performance demonstrated in fields such as machine translation and automatic speech recognition. One major challenge of attention-based models is the need of access to the full sequence and the quadratically growing computational cost concerning the sequence length. These characteristics pose challenges, especially for low-latency scenarios, where the system is often required to be streaming. In this paper, we build a compact and streaming speech recognition system on top of the end-to-end neural transducer architecture with attention-based modules augmented with convolution. The proposed system equips the end-to-end models with the streaming capability and reduces the large footprint from the streaming attention-based model using augmented memory. On the LibriSpeech dataset, our proposed system achieves word error rates 2.7 test-clean and 5.8 streaming approaches reported so far.

READ FULL TEXT
research
10/17/2016

End-to-end attention-based distant speech recognition with Highway LSTM

End-to-end attention-based models have been shown to be competitive alte...
research
10/30/2020

Streaming Simultaneous Speech Translation with Augmented Memory Transformer

Transformer-based models have achieved state-of-the-art performance on s...
research
10/28/2017

Attention-Based Models for Text-Dependent Speaker Verification

Attention-based models have recently shown great performance on a range ...
research
05/04/2021

Streaming end-to-end speech recognition with jointly trained neural feature enhancement

In this paper, we present a streaming end-to-end speech recognition mode...
research
04/12/2016

Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention

We present an attention-based model for end-to-end handwriting recogniti...
research
09/14/2019

Integrating Source-channel and Attention-based Sequence-to-sequence Models for Speech Recognition

This paper proposes a novel automatic speech recognition (ASR) framework...
research
12/11/2019

Leveraging End-to-End Speech Recognition with Neural Architecture Search

Deep neural networks (DNNs) have been demonstrated to outperform many tr...

Please sign up or login with your details

Forgot password? Click here to reset