Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur Speech Recognition

04/02/2022
by   Guodong Ma, et al.
1

In Uyghur speech, consonant and vowel reduction are often encountered, especially in spontaneous speech with high speech rate, which will cause a degradation of speech recognition performance. To solve this problem, we propose an effective phone mask training method for Conformer-based Uyghur end-to-end (E2E) speech recognition. The idea is to randomly mask off a certain percentage features of phones during model training, which simulates the above verbal phenomena and facilitates E2E model to learn more contextual information. According to experiments, the above issues can be greatly alleviated. In addition, deep investigations are carried out into different units in masking, which shows the effectiveness of our proposed masking unit. We also further study the masking method and optimize filling strategy of phone mask. Finally, compared with Conformer-based E2E baseline without mask training, our model demonstrates about 5.51 reduction on reading speech and 12.92 above approach has also been verified on test-set of open-source data THUYG-20, which shows 20

READ FULL TEXT
research
12/13/2021

PM-MMUT: Boosted Phone-mask Data Augmentation using Multi-modeing Unit Training for Robust Uyghur E2E Speech Recognition

Consonant and vowel reduction are often encountered in Uyghur speech, wh...
research
12/06/2019

Semantic Mask for Transformer based End-to-End Speech Recognition

Attention-based encoder-decoder model has achieved impressive results fo...
research
10/08/2021

Improving Pseudo-label Training For End-to-end Speech Recognition Using Gradient Mask

In the recent trend of semi-supervised speech recognition, both self-sup...
research
11/09/2020

Gated Recurrent Fusion with Joint Training Framework for Robust End-to-End Speech Recognition

The joint training framework for speech enhancement and recognition meth...
research
05/21/2020

End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming

Despite successful applications of end-to-end approaches in multi-channe...
research
08/16/2022

Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition

Optimization of modern ASR architectures is among the highest priority t...
research
07/01/2017

Rank-1 Constrained Multichannel Wiener Filter for Speech Recognition in Noisy Environments

Multichannel linear filters, such as the Multichannel Wiener Filter (MWF...

Please sign up or login with your details

Forgot password? Click here to reset