Run-and-back stitch search: novel block synchronous decoding for streaming encoder-decoder ASR

01/25/2022
by   Emiru Tsunoo, et al.
0

A streaming style inference of encoder-decoder automatic speech recognition (ASR) system is important for reducing latency, which is essential for interactive use cases. To this end, we propose a novel blockwise synchronous decoding algorithm with a hybrid approach that combines endpoint prediction and endpoint post-determination. In the endpoint prediction, we compute the expectation of the number of tokens that are yet to be emitted in the encoder features of the current blocks using the CTC posterior. Based on the expectation value, the decoder predicts the endpoint to realize continuous block synchronization, as a running stitch. Meanwhile, endpoint post-determination probabilistically detects backward jump of the source-target attention, which is caused by the misprediction of endpoints. Then it resumes decoding by discarding those hypotheses, as back stitch. We combine these methods into a hybrid approach, namely run-and-back stitch search, which reduces the computational cost and latency. Evaluations of various ASR tasks show the efficiency of our proposed decoding algorithm, which achieves a latency reduction, for instance in the Librispeech test set from 1487 ms to 821 ms at the 90th percentile, while maintaining a high recognition accuracy.

READ FULL TEXT
research
07/15/2021

VAD-free Streaming Hybrid CTC/Attention ASR for Unsegmented Recording

In this work, we propose novel decoding algorithms to enable streaming a...
research
07/24/2023

Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition

Although frame-based models, such as CTC and transducers, have an affini...
research
08/09/2022

ASR Error Correction with Constrained Decoding on Operation Prediction

Error correction techniques remain effective to refine outputs from auto...
research
04/10/2020

Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR

Recently, a few novel streaming attention-based sequence-to-sequence (S2...
research
07/01/2021

StableEmit: Selection Probability Discount for Reducing Emission Latency of Streaming Monotonic Attention ASR

While attention-based encoder-decoder (AED) models have been successfull...
research
04/21/2021

Label-Synchronous Speech-to-Text Alignment for ASR Using Forward and Backward Transformers

This paper proposes a novel label-synchronous speech-to-text alignment t...
research
04/13/2021

Equivalence of Segmental and Neural Transducer Modeling: A Proof of Concept

With the advent of direct models in automatic speech recognition (ASR), ...

Please sign up or login with your details

Forgot password? Click here to reset