Towards Zero-Shot Code-Switched Speech Recognition

11/02/2022
by   Brian Yan, et al.
0

In this work, we seek to build effective code-switched (CS) automatic speech recognition systems (ASR) under the zero-shot setting where no transcribed CS speech data is available for training. Previously proposed frameworks which conditionally factorize the bilingual task into its constituent monolingual parts are a promising starting point for leveraging monolingual data efficiently. However, these methods require the monolingual modules to perform language segmentation. That is, each monolingual module has to simultaneously detect CS points and transcribe speech segments of one language while ignoring those of other languages – not a trivial task. We propose to simplify each monolingual module by allowing them to transcribe all speech segments indiscriminately with a monolingual script (i.e. transliteration). This simple modification passes the responsibility of CS point detection to subsequent bilingual modules which determine the final output by considering multiple monolingual transliterations along with external language model information. We apply this transliteration-based approach in an end-to-end differentiable neural network and demonstrate its efficacy for zero-shot CS ASR on Mandarin-English SEAME test sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2019

Multi-Graph Decoding for Code-Switching ASR

In the FAME! Project, a code-switching (CS) automatic speech recognition...
research
07/28/2018

Acoustic and Textual Data Augmentation for Improved ASR of Code-Switching Speech

In this paper, we describe several techniques for improving the acoustic...
research
11/29/2021

Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization

Conversational bilingual speech encompasses three types of utterances: t...
research
11/02/2022

Monolingual Recognizers Fusion for Code-switching Speech Recognition

The bi-encoder structure has been intensively investigated in code-switc...
research
06/10/2021

KARI: KAnari/QCRI's End-to-End systems for the INTERSPEECH 2021 Indian Languages Code-Switching Challenge

In this paper, we present the Kanari/QCRI (KARI) system and the modeling...
research
11/10/2021

Scaling ASR Improves Zero and Few Shot Learning

With 4.5 million hours of English speech from 10 different sources acros...
research
08/21/2023

Implicit Self-supervised Language Representation for Spoken Language Diarization

In a code-switched (CS) scenario, the use of spoken language diarization...

Please sign up or login with your details

Forgot password? Click here to reset