End-to-end Speech Recognition with Adaptive Computation Steps

08/30/2018
by   Mohan Li, et al.
0

In this paper, we present Adaptive Computation Steps (ACS) algorithm, which enables end-to-end speech recognition models to dynamically decide how many frames should be processed to predict a linguistic output. The ACS equipped model follows the classic encoder-decoder framework, while unlike the attention-based models, it produces alignments independently at the encoder side using the correlation between adjacent frames. Thus, predictions can be made as soon as sufficient inter-frame information is received, which makes the model applicable in online cases. We verify the ACS algorithm on an open-source Mandarin speech corpus AIShell-1, and it achieves a parity of 35.2 the attention-based model in the online occasion. To fully demonstrate the advantage of ACS algorithm, offline experiments are conducted, in which our ACS model achieves 21.6 outperforming the attention-based counterpart. Index Terms: Adaptive Computation Steps, Encoder-Decoder Recurrent Neural Networks, End-to-End Training.

READ FULL TEXT
research
07/22/2017

Attention-Based End-to-End Speech Recognition on Voice Search

Recently, there has been an increasing interest in end-to-end speech rec...
research
03/14/2017

Multichannel End-to-end Speech Recognition

The field of speech recognition is in the midst of a paradigm shift: end...
research
10/29/2018

An improved hybrid CTC-Attention model for speech recognition

Recently, end-to-end speech recognition with a hybrid model consisting o...
research
09/15/2023

Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition

We study a streamable attention-based encoder-decoder model in which eit...
research
04/27/2020

Differentiable Adaptive Computation Time for Visual Reasoning

This paper presents a novel attention-based algorithm for achieving adap...
research
05/19/2020

Investigations on Phoneme-Based End-To-End Speech Recognition

Common end-to-end models like CTC or encoder-decoder-attention models us...
research
03/28/2020

Serialized Output Training for End-to-End Overlapped Speech Recognition

This paper proposes serialized output training (SOT), a novel framework ...

Please sign up or login with your details

Forgot password? Click here to reset