DeepAI AI Chat
Log In Sign Up

Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention Frames

11/02/2022
by   Chengdong Liang, et al.
nwpu.edu.cn
0

Recently, the unified streaming and non-streaming two-pass (U2/U2++) end-to-end model for speech recognition has shown great performance in terms of streaming capability, accuracy and latency. In this paper, we present fast-U2++, an enhanced version of U2++ to further reduce partial latency. The core idea of fast-U2++ is to output partial results of the bottom layers in its encoder with a small chunk, while using a large chunk in the top layers of its encoder to compensate the performance degradation caused by the small chunk. Moreover, we use knowledge distillation method to reduce the token emission latency. We present extensive experiments on Aishell-1 dataset. Experiments and ablation studies show that compared to U2++, fast-U2++ reduces model latency from 320ms to 80ms, and achieves a character error rate (CER) of 5.06 streaming setup.

READ FULL TEXT

page 1

page 2

page 3

page 4

06/10/2021

U2++: Unified Two-pass Bidirectional End-to-end Model for Speech Recognition

The unified streaming and non-streaming two-pass (U2) end-to-end model f...
11/04/2022

Minimum Latency Training of Sequence Transducers for Streaming End-to-End Speech Recognition

Sequence transducers, such as the RNN-T and the Conformer-T, are one of ...
11/07/2022

Peak-First CTC: Reducing the Peak Latency of CTC Models by Applying Peak-First Regularization

The CTC model has been widely applied to many application scenarios beca...
03/16/2023

DistillW2V2: A Small and Streaming Wav2vec 2.0 Based ASR Model

Wav2vec 2.0 (W2V2) has shown impressive performance in automatic speech ...
11/26/2020

Streaming end-to-end multi-talker speech recognition

End-to-end multi-talker speech recognition is an emerging research trend...
03/29/2022

Dynamic Latency for CTC-Based Streaming Automatic Speech Recognition With Emformer

An inferior performance of the streaming automatic speech recognition mo...
05/04/2021

Streaming end-to-end speech recognition with jointly trained neural feature enhancement

In this paper, we present a streaming end-to-end speech recognition mode...