Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention Frames

11/02/2022
by   Chengdong Liang, et al.
0

Recently, the unified streaming and non-streaming two-pass (U2/U2++) end-to-end model for speech recognition has shown great performance in terms of streaming capability, accuracy and latency. In this paper, we present fast-U2++, an enhanced version of U2++ to further reduce partial latency. The core idea of fast-U2++ is to output partial results of the bottom layers in its encoder with a small chunk, while using a large chunk in the top layers of its encoder to compensate the performance degradation caused by the small chunk. Moreover, we use knowledge distillation method to reduce the token emission latency. We present extensive experiments on Aishell-1 dataset. Experiments and ablation studies show that compared to U2++, fast-U2++ reduces model latency from 320ms to 80ms, and achieves a character error rate (CER) of 5.06 streaming setup.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2021

U2++: Unified Two-pass Bidirectional End-to-end Model for Speech Recognition

The unified streaming and non-streaming two-pass (U2) end-to-end model f...
research
11/04/2022

Minimum Latency Training of Sequence Transducers for Streaming End-to-End Speech Recognition

Sequence transducers, such as the RNN-T and the Conformer-T, are one of ...
research
11/07/2022

Peak-First CTC: Reducing the Peak Latency of CTC Models by Applying Peak-First Regularization

The CTC model has been widely applied to many application scenarios beca...
research
06/27/2023

Reducing the gap between streaming and non-streaming Transducer-based ASR by adaptive two-stage knowledge distillation

Transducer is one of the mainstream frameworks for streaming speech reco...
research
11/26/2020

Streaming end-to-end multi-talker speech recognition

End-to-end multi-talker speech recognition is an emerging research trend...
research
07/06/2022

Improving Streaming End-to-End ASR on Transformer-based Causal Models with Encoder States Revision Strategies

There is often a trade-off between performance and latency in streaming ...
research
05/04/2021

Streaming end-to-end speech recognition with jointly trained neural feature enhancement

In this paper, we present a streaming end-to-end speech recognition mode...

Please sign up or login with your details

Forgot password? Click here to reset