DeepAI AI Chat
Log In Sign Up

Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition

by   Ching-Feng Yeh, et al.

Attention-based models have been gaining popularity recently for their strong performance demonstrated in fields such as machine translation and automatic speech recognition. One major challenge of attention-based models is the need of access to the full sequence and the quadratically growing computational cost concerning the sequence length. These characteristics pose challenges, especially for low-latency scenarios, where the system is often required to be streaming. In this paper, we build a compact and streaming speech recognition system on top of the end-to-end neural transducer architecture with attention-based modules augmented with convolution. The proposed system equips the end-to-end models with the streaming capability and reduces the large footprint from the streaming attention-based model using augmented memory. On the LibriSpeech dataset, our proposed system achieves word error rates 2.7 test-clean and 5.8 streaming approaches reported so far.


End-to-end attention-based distant speech recognition with Highway LSTM

End-to-end attention-based models have been shown to be competitive alte...

Streaming Simultaneous Speech Translation with Augmented Memory Transformer

Transformer-based models have achieved state-of-the-art performance on s...

Attention-Based Models for Text-Dependent Speaker Verification

Attention-based models have recently shown great performance on a range ...

Streaming end-to-end speech recognition with jointly trained neural feature enhancement

In this paper, we present a streaming end-to-end speech recognition mode...

Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention

We present an attention-based model for end-to-end handwriting recogniti...

Integrating Source-channel and Attention-based Sequence-to-sequence Models for Speech Recognition

This paper proposes a novel automatic speech recognition (ASR) framework...

Leveraging End-to-End Speech Recognition with Neural Architecture Search

Deep neural networks (DNNs) have been demonstrated to outperform many tr...