Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition

07/24/2023
by   Emiru Tsunoo, et al.
0

Although frame-based models, such as CTC and transducers, have an affinity for streaming automatic speech recognition, their decoding uses no future knowledge, which could lead to incorrect pruning. Conversely, label-based attention encoder-decoder mitigates this issue using soft attention to the input, while it tends to overestimate labels biased towards its training domain, unlike CTC. We exploit these complementary attributes and propose to integrate the frame- and label-synchronous (F-/L-Sync) decoding alternately performed within a single beam-search scheme. F-Sync decoding leads the decoding for block-wise processing, while L-Sync decoding provides the prioritized hypotheses using look-ahead future frames within a block. We maintain the hypotheses from both decoding methods to perform effective pruning. Experiments demonstrate that the proposed search algorithm achieves lower error rates compared to the other search methods, while being robust against out-of-domain situations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2018

Vectorization of hypotheses and speech for faster beam search in encoder decoder-based speech recognition

Attention-based encoder decoder network uses a left-to-right beam search...
research
04/13/2021

Equivalence of Segmental and Neural Transducer Modeling: A Proof of Concept

With the advent of direct models in automatic speech recognition (ASR), ...
research
01/18/2021

Tiny Transducer: A Highly-efficient Speech Recognition Model on Edge Devices

This paper proposes an extremely lightweight phone-based transducer mode...
research
01/25/2022

Run-and-back stitch search: novel block synchronous decoding for streaming encoder-decoder ASR

A streaming style inference of encoder-decoder automatic speech recognit...
research
09/15/2021

Attention Is Indeed All You Need: Semantically Attention-Guided Decoding for Data-to-Text NLG

Ever since neural models were adopted in data-to-text language generatio...
research
10/18/2021

Efficient Sequence Training of Attention Models using Approximative Recombination

Sequence discriminative training is a great tool to improve the performa...
research
07/15/2021

VAD-free Streaming Hybrid CTC/Attention ASR for Unsegmented Recording

In this work, we propose novel decoding algorithms to enable streaming a...

Please sign up or login with your details

Forgot password? Click here to reset