Improving Mandarin Speech Recogntion with Block-augmented Transformer

07/24/2022
by   Xiaoming Ren, et al.
0

Recently Convolution-augmented Transformer (Conformer) has shown promising results in Automatic Speech Recognition (ASR), outperforming the previous best published Transformer Transducer. In this work, we believe that the output information of each block in the encoder and decoder is not completely inclusive, in other words, their output information may be complementary. We study how to take advantage of the complementary information of each block in a parameter-efficient way, and it is expected that this may lead to more robust performance. Therefore we propose the Block-augmented Transformer for speech recognition, named Blockformer. We have implemented two block ensemble methods: the base Weighted Sum of the Blocks Output (Base-WSBO), and the Squeeze-and-Excitation module to Weighted Sum of the Blocks Output (SE-WSBO). Experiments have proved that the Blockformer significantly outperforms the state-of-the-art Conformer-based models on AISHELL-1, our model achieves a CER of 4.35% without using a language model and 4.10% with an external language model on the testset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/16/2020

Conformer: Convolution-augmented Transformer for Speech Recognition

Recently Transformer and Convolution neural network (CNN) based models h...
research
03/27/2019

Automatic Spelling Correction with Transformer for CTC-based End-to-End Speech Recognition

Connectionist Temporal Classification (CTC) based end-to-end speech reco...
research
03/23/2023

Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognition

Transformer-based models have recently made significant achievements in ...
research
01/04/2020

Transformer-based language modeling and decoding for conversational speech recognition

We propose a way to use a transformer-based language model in conversati...
research
05/21/2023

Multi-Head State Space Model for Speech Recognition

State space models (SSMs) have recently shown promising results on small...
research
05/22/2017

Use of Knowledge Graph in Rescoring the N-Best List in Automatic Speech Recognition

With the evolution of neural network based methods, automatic speech rec...
research
09/01/2022

Attention Enhanced Citrinet for Speech Recognition

Citrinet is an end-to-end convolutional Connectionist Temporal Classific...

Please sign up or login with your details

Forgot password? Click here to reset