Decoupling Pronunciation and Language for End-to-end Code-switching Automatic Speech Recognition

10/28/2020
by   Shuai Zhang, et al.
0

Despite the recent significant advances witnessed in end-to-end (E2E) ASR system for code-switching, hunger for audio-text paired data limits the further improvement of the models' performance. In this paper, we propose a decoupled transformer model to use monolingual paired data and unpaired text data to alleviate the problem of code-switching data shortage. The model is decoupled into two parts: audio-to-phoneme (A2P) network and phoneme-to-text (P2T) network. The A2P network can learn acoustic pattern scenarios using large-scale monolingual paired data. Meanwhile, it generates multiple phoneme sequence candidates for single audio data in real-time during the training process. Then the generated phoneme-text paired data is used to train the P2T network. This network can be pre-trained with large amounts of external unpaired text data. By using monolingual data and unpaired text data, the decoupled transformer model reduces the high dependency on code-switching paired training data of E2E model to a certain extent. Finally, the two networks are optimized jointly through attention fusion. We evaluate the proposed method on the public Mandarin-English code-switching dataset. Compared with our transformer baseline, the proposed method achieves 18.14 reduction.

READ FULL TEXT
research
01/28/2022

Reducing language context confusion for end-to-end code-switching automatic speech recognition

Code-switching is about dealing with alternative languages in the commun...
research
01/03/2023

Modeling the Rhythm from Lyrics for Melody Generation of Pop Song

Creating a pop song melody according to pre-written lyrics is a typical ...
research
11/04/2020

Data Augmentation for End-to-end Code-switching Speech Recognition

Training a code-switching end-to-end automatic speech recognition (ASR) ...
research
04/08/2019

Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data

The lack of code-switch training data is one of the major concerns in th...
research
03/29/2022

CycleGAN-Based Unpaired Speech Dereverberation

Typically, neural network-based speech dereverberation models are traine...
research
07/28/2018

Back-Translation-Style Data Augmentation for End-to-End ASR

In this paper we propose a novel data augmentation method for attention-...
research
06/16/2020

End-to-End Code Switching Language Models for Automatic Speech Recognition

In this paper, we particularly work on the code-switched text, one of th...

Please sign up or login with your details

Forgot password? Click here to reset