Cascaded encoders for unifying streaming and non-streaming ASR

10/27/2020
by   Arun Narayanan, et al.
0

End-to-end (E2E) automatic speech recognition (ASR) models, by now, have shown competitive performance on several benchmarks. These models are structured to either operate in streaming or non-streaming mode. This work presents cascaded encoders for building a single E2E ASR model that can operate in both these modes simultaneously. The proposed model consists of streaming and non-streaming encoders. Input features are first processed by the streaming encoder; the non-streaming encoder operates exclusively on the output of the streaming encoder. A single decoder then learns to decode either using the output of the streaming or the non-streaming encoder. Results show that this model achieves similar word error rates (WER) as a standalone streaming model when operating in streaming mode, and obtains 10 when operating in non-streaming mode. Our results also show that the proposed approach outperforms existing E2E two-pass models, especially on long-form speech.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2022

On Comparison of Encoders for Attention based End to End Speech Recognition in Standalone and Rescoring Mode

The streaming automatic speech recognition (ASR) models are more popular...
research
02/17/2022

Non-Autoregressive ASR with Self-Conditioned Folded Encoders

This paper proposes CTC-based non-autoregressive ASR with self-condition...
research
01/23/2023

Efficient Encoders for Streaming Sequence Tagging

A naive application of state-of-the-art bidirectional encoders for strea...
research
03/31/2023

Lego-Features: Exporting modular encoder features for streaming and deliberation ASR

In end-to-end (E2E) speech recognition models, a representational tight-...
research
04/13/2022

A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes

In this paper, we propose a dynamic cascaded encoder Automatic Speech Re...
research
09/13/2022

Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification

Language identification is critical for many downstream tasks in automat...
research
06/01/2023

Enhancing the Unified Streaming and Non-streaming Model with Contrastive Learning

The unified streaming and non-streaming speech recognition model has ach...

Please sign up or login with your details

Forgot password? Click here to reset