Non-autoregressive Mandarin-English Code-switching Speech Recognition with Pinyin Mask-CTC and Word Embedding Regularization

04/06/2021
by   Shun-Po Chuang, et al.
0

Mandarin-English code-switching (CS) is frequently used among East and Southeast Asian people. However, the intra-sentence language switching of the two very different languages makes recognizing CS speech challenging. Meanwhile, the recent successful non-autoregressive (NAR) ASR models remove the need for left-to-right beam decoding in autoregressive (AR) models and achieved outstanding performance and fast inference speed. Therefore, in this paper, we took advantage of the Mask-CTC NAR ASR framework to tackle the CS speech recognition issue. We propose changing the Mandarin output target of the encoder to Pinyin for faster encoder training, and introduce Pinyin-to-Mandarin decoder to learn contextualized information. Moreover, we propose word embedding label smoothing to regularize the decoder with contextualized information and projection matrix regularization to bridge that gap between the encoder and decoder. We evaluate the proposed methods on the SEAME corpus and achieved exciting results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2020

Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition

Code-switching (CS) occurs when a speaker alternates words of two or mor...
research
07/12/2020

The ASRU 2019 Mandarin-English Code-Switching Speech Recognition Challenge: Open Datasets, Tracks, Methods and Results

Code-switching (CS) is a common phenomenon and recognizing CS speech is ...
research
10/07/2021

Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models

Code-switching (CS) is common in daily conversations where more than one...
research
10/28/2019

Sequence-to-sequence Automatic Speech Recognition with Word Embedding Regularization and Fused Decoding

In this paper, we investigate the benefit that off-the-shelf word embedd...
research
11/30/2020

Transformer-Transducers for Code-Switched Speech Recognition

We live in a world where 60 languages fluently. Members of these communi...
research
08/29/2021

Investigations on Speech Recognition Systems for Low-Resource Dialectal Arabic-English Code-Switching Speech

Code-switching (CS), defined as the mixing of languages in conversations...
research
07/15/2022

Knowledge Transfer and Distillation from Autoregressive to Non-Autoregressive Speech Recognition

Modern non-autoregressive (NAR) speech recognition systems aim to accele...

Please sign up or login with your details

Forgot password? Click here to reset