Adding Connectionist Temporal Summarization into Conformer to Improve Its Decoder Efficiency For Speech Recognition

04/08/2022
by   Nick J. C. Wang, et al.
0

The Conformer model is an excellent architecture for speech recognition modeling that effectively utilizes the hybrid losses of connectionist temporal classification (CTC) and attention to train model parameters. To improve the decoding efficiency of Conformer, we propose a novel connectionist temporal summarization (CTS) method that reduces the number of frames required for the attention decoder fed from the acoustic sequences generated by the encoder, thus reducing operations. However, to achieve such decoding improvements, we must fine-tune model parameters, as cross-attention observations are changed and thus require corresponding refinements. Our final experiments show that, with a beamwidth of 4, the LibriSpeech's decoding budget can be reduced by up to 20 accuracy. An improvement in accuracy is even found for the LibriSpeech "test-other" set. The word error rate (WER) is reduced by 6% relative at the beam width of 1 and by 3

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2018

Improved hybrid CTC-Attention model for speech recognition

Recently, end-to-end speech recognition with a hybrid model consisting o...
research
10/29/2018

An improved hybrid CTC-Attention model for speech recognition

Recently, end-to-end speech recognition with a hybrid model consisting o...
research
07/13/2023

Personalization for BERT-based Discriminative Speech Recognition Rescoring

Recognition of personalized content remains a challenge in end-to-end sp...
research
10/25/2022

Linguistic-Enhanced Transformer with CTC Embedding for Speech Recognition

The recent emergence of joint CTC-Attention model shows significant impr...
research
01/27/2020

Scaling Up Online Speech Recognition Using ConvNets

We design an online end-to-end speech recognition system based on Time-D...
research
06/15/2020

Regularized Forward-Backward Decoder for Attention Models

Nowadays, attention models are one of the popular candidates for speech ...
research
06/29/2022

Language-specific Characteristic Assistance for Code-switching Speech Recognition

Dual-encoder structure successfully utilizes two language-specific encod...

Please sign up or login with your details

Forgot password? Click here to reset