Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data

04/08/2019
by   Yerbolat Khassanov, et al.
0

The lack of code-switch training data is one of the major concerns in the development of end-to-end code-switching automatic speech recognition (ASR) models. In this work, we propose a method to train an improved end-to-end code-switching ASR using only monolingual data. Our method encourages the distributions of output token embeddings of monolingual languages to be similar, and hence, promotes the ASR model to easily code-switch between languages. Specifically, we propose to use Jensen-Shannon divergence and cosine distance based constraints. The former will enforce output embeddings of monolingual languages to possess similar distributions, while the later simply brings the centroids of two distributions to be close to each other. Experimental results demonstrate high effectiveness of the proposed method, yielding up to 4.5 code-switching ASR task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/28/2022

Reducing language context confusion for end-to-end code-switching automatic speech recognition

Code-switching is about dealing with alternative languages in the commun...
research
06/01/2020

Learning to Recognize Code-switched Speech Without Forgetting Monolingual Speech Recognition

Recently, there has been significant progress made in Automatic Speech R...
research
06/14/2023

Towards training Bilingual and Code-Switched Speech Recognition models from Monolingual data sources

Multilingual Automatic Speech Recognition (ASR) models are capable of tr...
research
11/30/2020

Transformer-Transducers for Code-Switched Speech Recognition

We live in a world where 60 languages fluently. Members of these communi...
research
06/16/2020

End-to-End Code Switching Language Models for Automatic Speech Recognition

In this paper, we particularly work on the code-switched text, one of th...
research
11/29/2021

Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization

Conversational bilingual speech encompasses three types of utterances: t...
research
10/28/2020

Decoupling Pronunciation and Language for End-to-end Code-switching Automatic Speech Recognition

Despite the recent significant advances witnessed in end-to-end (E2E) AS...

Please sign up or login with your details

Forgot password? Click here to reset