E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model

11/28/2022
by   W. Ronny Huang, et al.
0

We explore unifying a neural segmenter with two-pass cascaded encoder ASR into a single model. A key challenge is allowing the segmenter (which runs in real-time, synchronously with the decoder) to finalize the 2nd pass (which runs 900 ms behind real-time) without introducing user-perceived latency or deletion errors during inference. We propose a design where the neural segmenter is integrated with the causal 1st pass decoder to emit a end-of-segment (EOS) signal in real-time. The EOS signal is then used to finalize the non-causal 2nd pass. We experiment with different ways to finalize the 2nd pass, and find that a novel dummy frame injection strategy allows for simultaneous high quality 2nd pass results and low finalization latency. On a real-world long-form captioning task (YouTube), we achieve 2.4 a baseline VAD-based segmenter with the same cascaded encoder.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/21/2020

A Better and Faster End-to-End Model for Streaming ASR

End-to-end (E2E) models have shown to outperform state-of-the-art conven...
research
03/31/2023

Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR

Conformer models maintain a large number of internal states, the vast ma...
research
04/05/2016

Designing robust watermark barcodes for multiplex long-read sequencing

A method for designing sequencing barcodes that can withstand a large nu...
research
04/13/2022

A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes

In this paper, we propose a dynamic cascaded encoder Automatic Speech Re...
research
07/20/2023

Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding

There has been an increased interest in the integration of pretrained sp...
research
12/07/2016

A Multi-Pass Approach to Large-Scale Connectomics

The field of connectomics faces unprecedented "big data" challenges. To ...
research
06/05/2020

ELITR Non-Native Speech Translation at IWSLT 2020

This paper is an ELITR system submission for the non-native speech trans...

Please sign up or login with your details

Forgot password? Click here to reset