Transformer-Transducers for Code-Switched Speech Recognition

11/30/2020
by   Siddharth Dalmia, et al.
0

We live in a world where 60 languages fluently. Members of these communities constantly switch between languages when having a conversation. As automatic speech recognition (ASR) systems are being deployed to the real-world, there is a need for practical systems that can handle multiple languages both within an utterance or across utterances. In this paper, we present an end-to-end ASR system using a transformer-transducer model architecture for code-switched speech recognition. We propose three modifications over the vanilla model in order to handle various aspects of code-switching. First, we introduce two auxiliary loss functions to handle the low-resource scenario of code-switching. Second, we propose a novel mask-based training strategy with language ID information to improve the label encoder training towards intra-sentential code-switching. Finally, we propose a multi-label/multi-audio encoder structure to leverage the vast monolingual speech corpora towards code-switching. We demonstrate the efficacy of our proposed approaches on the SEAME dataset, a public Mandarin-English code-switching corpus, achieving a mixed error rate of 18.5 and 26.3

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/28/2022

Reducing language context confusion for end-to-end code-switching automatic speech recognition

Code-switching is about dealing with alternative languages in the commun...
research
04/01/2021

Multilingual and code-switching ASR challenges for low resource Indian languages

Recently, there is increasing interest in multilingual automatic speech ...
research
04/08/2019

Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data

The lack of code-switch training data is one of the major concerns in th...
research
05/31/2023

Simple yet Effective Code-Switching Language Identification with Multitask Pre-Training and Transfer Learning

Code-switching, also called code-mixing, is the linguistics phenomenon w...
research
11/29/2021

Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization

Conversational bilingual speech encompasses three types of utterances: t...
research
04/06/2021

Non-autoregressive Mandarin-English Code-switching Speech Recognition with Pinyin Mask-CTC and Word Embedding Regularization

Mandarin-English code-switching (CS) is frequently used among East and S...
research
08/11/2023

Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual Loss

We introduce a bilingual solution to support English as secondary locale...

Please sign up or login with your details

Forgot password? Click here to reset